Bug #761065 “[Sandybridge] Spurious “*ERROR* Hangcheck timer ela...” : Bugs : xserver-xorg-video-intel package : Ubuntu

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-04-14:

#1

BootDmesg.txt Edit (52.6 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (2.7 KiB, text/plain; charset="utf-8")
GconfCompiz.txt Edit (36.7 KiB, text/plain; charset="utf-8")
GdmLog.gz Edit (4.6 KiB, application/x-gzip)
GdmLog1.gz Edit (4.6 KiB, application/x-gzip)
GdmLog2.gz Edit (4.6 KiB, application/x-gzip)
Lspci.txt Edit (11.9 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (418 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (3.5 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (1.8 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (2.0 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (108.4 KiB, text/plain; charset="utf-8")
UdevLog.txt Edit (267.6 KiB, text/plain; charset="utf-8")
UnitySupportTest.txt Edit (614 bytes, text/plain; charset="utf-8")
XorgLog.gz Edit (6.1 KiB, application/x-gzip)
XorgLogOld.gz Edit (6.3 KiB, application/x-gzip)
Xrandr.txt Edit (4.7 KiB, text/plain; charset="utf-8")
drirc.txt Edit (768 bytes, text/plain; charset="utf-8")
peripherals.txt Edit (1.6 KiB, text/plain; charset="utf-8")
xdpyinfo.txt Edit (9.7 KiB, text/plain; charset="utf-8")
xinput.txt Edit (834 bytes, text/plain; charset="utf-8")

tags:

added: hwe-blocker

Bryce Harrington (bryce) on 2011-04-14

Changed in xserver-xorg-video-intel (Ubuntu):
status:	New → Triaged

Revision history for this message

Bryce Harrington (bryce) wrote on 2011-04-14:

#2

This is almost certainly a kernel issue, but rather than close the X driver task I'll set the priority to Medium, so we can keep track of the issue on the X side.

Changed in xserver-xorg-video-intel (Ubuntu):
importance:	Undecided → Medium

Revision history for this message

Adilson Oliveira (agoliveira) wrote on 2011-04-14:

#3

I have the same issue here but I discovered that, if I run a 3D program like celestia, nexuiz or even a 3d screensaver like one with the wireframe ant, the problem vanishes with a -39 kernel. In a discussion with Chris Van Hoof today he tells me he thinks the new kernel just masks the error but, at least to the user's POV it actually solves it for the common applications the users run. I kept nexuiz running for more than 2 hours and haven't noticed a single freeze.
I would like to ask the priority to be raised as this can affect many new systems.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Triaged → Confirmed
tags:	added: kernel-bug

Bug Watch Updater (bug-watch-updater) on 2011-04-14

Changed in linux:
importance:	Unknown → Medium
status:	Unknown → Confirmed

Bryce Harrington (bryce) on 2011-04-15

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

Bryce Harrington (bryce) wrote on 2011-04-15:

#4

"I would like to ask the priority to be raised as this can affect many new systems."

@Adilson, no, like I mentioned it's a kernel bug, not X. I was just leaving the X task open (but set to Medium) so that the X team could track the issue with their bugs, give it attention and help towards getting it resolved, even though there is no actual work needed to be done on the X side.

However, your comment that the issue goes away in -39 proves that it really is a kernel issue, and your comment about the priority makes me realize having the X task open is just causing confusion, so I'll just cancel it out. Hopefully you can get some attention on it from the kernel team.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Triaged → Invalid

Revision history for this message

Adilson Oliveira (agoliveira) wrote on 2011-04-15:

#5

@Bryce: Sorry, I didn't realize this bug was against X. It is indeed related to the kernel, that's why I tagged with kernel-bug.

Revision history for this message

Adilson Oliveira (agoliveira) wrote on 2011-04-15:

#6

Another thing I forgot to add: if one starts the session in Natty with classic Ubuntu without any effects, even with the current kernel, the problem does not appear.

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-23:

#7

I have exact same bug on my sandybridge machine. i7-2600S on Zotac H67-ITX. Kernel: 2.6.38-8-generic x86_64

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-23:

#8

Isn't this one duplicate to #762855 ?

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-25:

#9

Have also tested this on a ASRock H67M-ITX with i3-2100T and i5-2500T. Exact same symptom. I'm pretty sure now all "Sandy Bridge" internal GPUs are affected, at least with H67 chipset. This looks serious to me as almost all new PCs are sold with "Sandy Bridge". Using development Kernel 2.6.39 isn't a good solution for "Sandy Bridge" owners i guess. Any chance to get the patch from 2.6.39 back in 2.6.38 ? I'm willing to help with testing.

Revision history for this message

AceLan Kao (acelankao) wrote on 2011-04-26:

#10

Hi,

This is the new natty kernel I cherry picked 4 patches from .39 kernel.
Please try it and check if this kernel fix the problem, thanks.
http://people.canonical.com/~acelan/bugs/lp753189/

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-26:

#11

Thx for the new Kernel AceLan!

I tested the x86_64 version on two different Boards. It got a little better I think - bug happens not that frequently imho.
But still, if I e.g. press alt-f2 in Unity and start typing I still get lags and dmesg shows:

[ 85.166147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 39358, at 39358], missed IRQ?
[ 88.283088] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 40776, at 40776], missed IRQ?
[ 109.272320] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 56389, at 56389], missed IRQ?
[ 111.370250] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 56943, at 56943], missed IRQ?

It's better reproducible if you play some opengl game. e.g. neverball.

Revision history for this message

Jérôme Petazzoni (jerome-petazzoni) wrote on 2011-04-26:

#12

On my setup (Dell 6420 with integrated HD3000; shows as 8086:0126 (rev 09) in lspci; running Debian 2.6.38-2), the bug appears only after a while (so it's hard to reproduce).

I tried your new kernel anyway. I got the error just once, after the boot:
[ 73.270170] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 11744, at 11744], missed IRQ?
I will stick with this kernel until the bug happens again (or not ;-))
Just tried neverball for a couple of minutes, it didn't trigger the bug.

Note: I don't know if it's relevant, but since I installed this kernel, opening the lid of the laptop will turn the screen to black. The screen remains OK when I close the lid, but when I re-open it, it goes black. It's not off (I still see the backlight), just black. It is then stuck in that state: switching back to textmode won't help. However, if I suspend to RAM and resume, the screen will work again. Before using this kernel, this bug occured exactly 50% of the time (i.e., I could close+open once, and the 2nd time would trigger the bug).

Felix Engemann (felix-engemann) on 2011-04-26

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-26:

#13

Seems there is yet another duplicate of this bug: https://bugs.launchpad.net/ubuntu/+bug/753189

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-26:

#14

Also seems to affect Fedora 14 Kernel 2.6.38:

https://bugzilla.redhat.com/show_bug.cgi?id=684097

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-26:

#15

@AceLan Kao

I think I found the i915 patch, which would fix this bug (along with a lot of other documented i915 fixes):

from (http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc1)

commit 36d527deadf7d0c302e3452dde39465e74a65a08
Author: Chris Wilson <email address hidden>
Date: Sat Mar 19 22:26:49 2011 +0000

drm/i915: Restore missing command flush before interrupt on BLT ring

    We always skipped flushing the BLT ring if the request flush did not
    include the RENDER domain. However, this neglects that we try to flush
    the COMMAND domain after every batch and before the breadcrumb interrupt
    (to make sure the batch is indeed completed prior to the interrupt
    firing and so insuring CPU coherency). As a result of the missing flush,
    incoherency did indeed creep in, most notable when using lots of command
    buffers and so potentially rewritting an active command buffer (i.e.
    the GPU was still executing from it even though the following interrupt
    had already fired and the request/buffer retired).

As all ring->flush routines now have the same preconditions, de-duplicate
and move those checks up into i915_gem_flush_ring().

Fixes gem_linear_blit.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284
    Signed-off-by: Chris Wilson <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    Tested-by: <email address hidden>

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-27:

#16

Ok forget the post above. Seems to have nothing to do with this bug.

On Kernel 2.8.38: /sys/module/i915/parameters/semaphores is set to 0
On Kernel 2.8.39: /sys/module/i915/parameters/semaphores is set to 1

if i do the following:

sudo -i
echo 1 > /sys/module/i915/parameters/semaphores

The Bug is also gone on 2.8.38 .....

If i set semaphores on 2.8.39 to 0 as it is default on 2.8.38, the bug also appears.

Revision history for this message

Avi Romanoff (aroman) wrote on 2011-04-27:

#17

I have a brand new Vostro 3550, with an i5 Sandybridge, and I have this issue. Thankfully, (major props), Felix's above change resolved the issue. I hope this issue can be resolved for good, is this fix likely to land in Natty for real?

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-04-27:

#18

@Avi Romanoff

you can put this option in /etc/default/grub to make the change permanent
replace following line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

by
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=1"

Revision history for this message

sabyasachi roychowdhury (sabyasachir) wrote on 2011-05-02:

#19

setting /sys/module/i915/parameters/semaphores to 1 works in Ubuntu and Fedora. But in fedora(I tested on F15 --> 2.6.38.4 ), it resets to 0 after reboot. Is there a way to permanently set semaphore to 1 in Fedora?

Revision history for this message

Otto Kekäläinen (otto) wrote on 2011-05-06:

#20

Comment #16 fixes it on my Dell E5420.

The funny thing is, that normally the error message appears randomly, but when I insert my Nokia CS-17 3G-modem, the error always appeared three times in a row:
[ 3678.937434] usb 2-1.2: new high speed USB device using ehci_hcd and address 9
[ 3679.050002] cdc_acm 2-1.2:1.1: ttyACM0: USB ACM device
[ 3679.051045] cdc_acm 2-1.2:1.3: ttyACM1: USB ACM device
[ 3698.657860] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1395816, at 1395816], missed IRQ?
[ 3703.191041] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1398151, at 1398151], missed IRQ?
[ 3704.728817] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1398192, at 1398192], missed IRQ?
[ 3718.388454] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1403486, at 1403486], missed IRQ?

I don't know how a device in USB port generates error messages related to a video driver, but it is.

Revision history for this message

Guido Nickels (gsn) wrote on 2011-05-18:

#21

Changing semaphores setting doesn't help here (Fujitsu S751 with 8086:0126 graphics controller).

I still get reproducable freezes - for example when working with libreoffice it appears always after only a few actions (for example if I do some copy+paste stuff).

The only change is that instead complaining about missed IRQ it is now complaining about stuck semaphore...

syslog entry with i915.semaphores=0:

May 14 22:06:48 silk kernel: [ 1665.383643] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 443994, at 443994], missed IRQ?
May 14 22:07:51 silk kernel: [ 1727.982616] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 465770, at 465770], missed IRQ?

and with i915.semaphores=1:

May 16 20:56:32 silk kernel: [25529.161099] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 16 20:56:32 silk kernel: [25529.161124] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring

I tried 2.6.38-9.43 from natty as well as 2.6.39-0.5~20110427 from ppa:kernel-ppa/ppa but no difference.

Revision history for this message

joshua darnell (joshuadarnell-gmail) wrote on 2011-05-18:

#22

lspci.output Edit (8.8 KiB, text/plain)

Download full text (3.5 KiB)

i'm having similar issues...and it most certainly seems to be originating from the kernel and not from the X server, as i've run through three different versions of both the kernel and the X server while holding one fixed and changing the other and changing the X server didn't affect the problem at all, but changing the kernels did.

not sure what debugging information might be helpful in helping to shed light on my particular config, but would be happy to post some, just let me know what to post. for now, i'll just attach an lspci and give a bit of a description:

(1) started with "latest version" of ubuntu iso on thinkpad x220 with corei7-2620m and intel HD3000. the iso at the time provided me with kernel and headers version 2.6.38-8.42. aside from some thinkpad-tinkering, everything pretty much worked "out of the box", as others have reported.

(2) then, connected to dell u2711 via displayport and got a desktop that was seemingly split between two windows, and everything on the u2711 was blacked out except for the top menu and a square sitting on the lower left corner of the panel the size of the thinkpad's display. thinkpad display unresponsive during this time (but not black). rebooted with display connected and both screens black (and unusable).

(3) tried to upgrade to kernel 2.6.39-1.6, which had the same outcome as kernel 2.6.39-2.7. upon reboot, monitors set to mirror same image with 1024x768 resolution. i know this is unrelated, but here and with the 2.6.38-8.42 kernel, the Fn-F7 worked for switching between resolutions. anyway, upon switching between resolutions i got the same outcome as in (2), except that now i could use the Fn-F7 to switch between non-working full-resolutions and working (mirrored) 1024x768 resolutions.

(4) after a bit of digging around, found a recommendation to try kernel 2.6.38-996-generic #201103251543 SMP Fri Mar 25 15:47:37 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux, and accompanying headers, which worked properly. currently using xorg-server 2:1.10.1+git20110429+server-1.10-branch.b4455b11-0ubuntu0sarvatt, but switching to 2.14.0ubuntu7.1 from the natty updates, as well as using 2.14-0-4ubuntu7 (natty) made absolutely no difference. these three xorg versions were used in all my testing.

At (4) both displays are function properly. however, i cannot switch between monitors without rebooting. whatever monitor i boot in works, but if that's the u2711 fed through the display port, then if i unplug the display port cable, the thinkpad LCD is not switched to automatically (and remains off, not just black). likewise, if i boot with the thinkpad display active, plugging in the u2711 9-times-out-of-10 causes both screens to go black, the Fn-F7 doesn't work, and to fix the problem i have to hard-reboot. it's interesting to note that in the gnome monitors panel, only the active display shows. in other words, if i boot while connected to the u2711, it shows (but the built-in laptop screen doesn't), and if i boot without the external display connected, the thinkpad LCD shows, but once i plug in the u2711 (without rebooting) the 1 time out of 10 the display doesn't go nuts on me, it only shows the u2711 and not ...

i'm having similar issues...and it most certainly seems to be originating from the kernel and not from the X server, as i've run through three different versions of both the kernel and the X server while holding one fixed and changing the other and changing the X server didn't affect the problem at all, but changing the kernels did.

not sure what debugging information might be helpful in helping to shed light on my particular config, but would be happy to post some, just let me know what to post. for now, i'll just attach an lspci and give a bit of a description:

(1) started with "latest version" of ubuntu iso on thinkpad x220 with corei7-2620m and intel HD3000. the iso at the time provided me with kernel and headers version 2.6.38-8.42.  aside from some thinkpad-tinkering, everything pretty much worked "out of the box", as others have reported.

(2) then, connected to dell u2711 via displayport and got a desktop that was seemingly split between two windows, and everything on the u2711 was blacked out except for the top menu and a square sitting on the lower left corner of the panel the size of the thinkpad's display. thinkpad display unresponsive during this time (but not black). rebooted with display connected and both screens black (and unusable).

(3) tried to upgrade to kernel 2.6.39-1.6, which had the same outcome as kernel 2.6.39-2.7. upon reboot, monitors set to mirror same image with 1024x768 resolution. i know this is unrelated, but here and with the 2.6.38-8.42 kernel, the Fn-F7 worked for switching between resolutions. anyway, upon switching between resolutions i got the same outcome as in (2), except that now i could use the Fn-F7 to switch between non-working full-resolutions and working (mirrored) 1024x768 resolutions.

(4) after a bit of digging around, found a recommendation to try kernel 2.6.38-996-generic #201103251543 SMP Fri Mar 25 15:47:37 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux, and accompanying headers, which worked properly. currently using xorg-server 2:1.10.1+git20110429+server-1.10-branch.b4455b11-0ubuntu0sarvatt, but switching to 2.14.0ubuntu7.1 from the natty updates, as well as using 2.14-0-4ubuntu7 (natty) made absolutely no difference. these three xorg versions were used in all my testing.

At (4) both displays are function properly. however, i cannot switch between monitors without rebooting. whatever monitor i boot in works, but if that's the u2711 fed through the display port, then if i unplug the display port cable, the thinkpad LCD is not switched to automatically (and remains off, not just black). likewise, if i boot with the thinkpad display active, plugging in the u2711 9-times-out-of-10 causes both screens to go black, the Fn-F7 doesn't work, and to fix the problem i have to hard-reboot. it's interesting to note that in the gnome monitors panel, only the active display shows. in other words, if i boot while connected to the u2711, it shows (but the built-in laptop screen doesn't), and if i boot without the external display connected, the thinkpad LCD shows, but once i plug in the u2711 (without rebooting) the 1 time out of 10 the display doesn't go nuts on me, it only shows the u2711 and not both the u2711 and the thinkpad LCD, as i would expect. a strange aside, the "thinkvantage" button works for every kernel change except (4), the one in which both displays work properly. additionally, i had to recompile the r8192ce_pci wireless module from the DKMS package to get it to work with each kernel change as this driver was not included in the newer kernels.

Revision history for this message

Raul Dias (rsd) wrote on 2011-05-21:

#23

doing what Felix proposed at #16 didn't work on natty (2.6.38-8-generic):
# echo 1 > /sys/module/i915/parameters/semaphores
-bash: echo: write error: Invalid argument

However, doing what proposed on #18 solved the problem.

XPS L502X with optimus/bumblebee working

Jeremy Foshee (jeremyfoshee) on 2011-05-24

Changed in linux (Ubuntu):
importance:	Undecided → Medium

Revision history for this message

joshua darnell (joshuadarnell-gmail) wrote on 2011-05-24:

#24

quick follow up to post #22...went over to 2.6.37-gentoo-r4 and compiled in i915 support. works fine. can post kernel options if necessary.

Revision history for this message

vilmos (vilmos) wrote on 2011-05-30:

#25

Using the latest 2.6.38 kernel from Natty stable this message popped up in dmesg a few times every hour (my system always coming to a complete halt for 1-2 seconds), and after a few hours my system eventually freezed (this is a Lenovo X220 with Sandy Bridge). Setting the semaphore parameter to "1" as suggested above helped somewhat, but I still get the hangcheck errors and freezes.

Installing 2.6.39-0.5~20110427 from kernel-ppa improved on the situation, but eventually after a bit more than a day my system froze again with the following message in dmesg:

May 30 08:02:31 x220 kernel: [26264.491209] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
May 30 08:02:31 x220 kernel: [26264.491229] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:32 x220 kernel: [26265.987805] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
May 30 08:02:32 x220 kernel: [26265.987828] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:34 x220 kernel: [26267.484409] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
May 30 08:02:34 x220 kernel: [26267.484428] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:35 x220 kernel: [26268.981019] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed... GPU hung
May 30 08:02:35 x220 kernel: [26268.981039] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:37 x220 kernel: [26270.477625] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
timer elapsed...May 30 08:04:43 x220 kernel: imklog 4.6.4, log source = /proc/kmsg starte
d.

Semaphore is set to "1" by default on 2.6.39, although it seems from the kernel git log that this change has been reverted since. Now I'm testing 2.6.39 with semaphore set to "0" to see if it eliminates the problem.

Revision history for this message

vilmos (vilmos) wrote on 2011-05-30:

#26

With semaphore set to "0" on 2.6.39-0.5~20110427:

[ 5136.145217] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1132622, at 1132622], missed IRQ?
[ 5382.346399] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1242913, at 1242913], missed IRQ?
[ 5551.951435] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1327817, at 1327817], missed IRQ?

It's actually easy to reproduce this (at least on my system): just click the ubuntu menu in the upper left corner, and start typing in the search box until the UI freezes for 1-2 seconds.

Revision history for this message

si14 (a2alt) wrote on 2011-06-01:

#27

Is there any hope to get this fixed?

Revision history for this message

Pascal Hartig (passy) wrote on 2011-06-01:

#28

See #18, the workaround is doing a great job for me.

Revision history for this message

si14 (a2alt) wrote on 2011-06-01:

#29

It seems working, but there is something strange:
root@si14-laptop:~# cat /etc/default/grub | grep semaph
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=1"
root@si14-laptop:~# cat /sys/module/i915/parameters/semaphores
0
And this is so after reboot.

Revision history for this message

Anton Belyaev (anton-belyaev) wrote on 2011-06-02:

#30

si14, check out the header of /etc/default/grub, it says you need to run grub-update after the modification.

Revision history for this message

si14 (a2alt) wrote on 2011-06-02:

#31

Thanks, you are right.

Revision history for this message

vilmos (vilmos) wrote on 2011-06-04:

#32

Yesterday I updated from the xorg-edgers ppa and the bug seems to be gone now. I even set the semaphore parameter to "0", which was a sure way to trigger this problem, and now there are no error messages about GPU hangs in dmesg.

This Phoronix article also suggests that this problem was fixed in mesa 7.11-devel, not in the kernel:

http://www.phoronix.com/scan.php?page=article&item=intel_snb_natty&num=1

Right now I have the latest xorg-edgers packages with kernel 2.6.39-0.5~20110427 from kernel-ppa , and everything is running smoothly.

Any Ubuntu developers care to track down which mesa commit(s) should be backported to Natty? Would be great if I could go back to stable Natty packages.

Revision history for this message

Pascal Hartig (passy) wrote on 2011-06-04:

#33

@vilmos DId you notice any change regarding the power consumption? I read that the issue is related to power saving and I wonder if the fix has any impact on that.

Revision history for this message

vilmos (vilmos) wrote on 2011-06-04:

#34

@Pascal no I did not, however I updated the packages only yesterday, so I can't really tell yet. I did not make any measurements on battery life before the update either, so I don't think I'll be able to give you exact numbers on any change on battery life.

Revision history for this message

theghost (theghost) wrote on 2011-06-06:

#35

Workaround from #18 works fine here.

The system stucks no more. It would be nice to see a fix for this issue by default in Ubuntu, especially for new users with new hardware and litte knowledge.

Revision history for this message

tobyS (tobias-schlitt) wrote on 2011-06-09:

#36

Fix #16 works fine so far here (2.6.38-8-generic, X220, i7-2620M Sandybridge). Thanks!

However, I'm additionally affected by #22 (4): I cannot dock/undock the X220 with the Lenovo Minidock Plus where I have 2 screens attached via display port. With a fresh boot there are initially some issues with the desktop background, which vanish after starting an app in full screen mode. However, switching between the display setups (undocked = notebook display, docked = 2 externals) is not possible without reboot).

Revision history for this message

Bilal Akhtar (bilalakhtar) wrote on 2011-06-15:

#37

I'm using Ubuntu Oneiric with the 3.0 kernel in the repositories and the entire X stack from the xorg-edgers PPA. I don't get this message anymore (got it when I was using Natty with the main repository kernel and Natty X (not from xorg-edgers)) and the display doesn't hang anymore when typing in the Unity search dialog.

So if X is going to be updated this cycle, then this issue would be fixed for good in Oneiric. The kernel has already been upgraded to 3.0, so X is the only thing which needs an upstream upgrade.

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-06-17:

#38

Someone has posted a patch that fixes this issue on the intel-gfx mailing list and it should hopefully be in 3.0-rc4. afterwards we will be able to cherry-pick it into stable. I have test kernels available here that fix the issue, with the side effect of also massively speeding up 3D with the default i915.semaphores=0 from my tests, please do post your results if you test it.

fix:
https://patchwork.kernel.org/patch/879532/

test kernels:
http://kernel.ubuntu.com/~sarvatt/lp761065/

Robert Hooker (sarvatt) on 2011-06-17

Changed in linux (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

vilmos (vilmos) wrote on 2011-06-17:

#39

Have been running the kernel from #38 for a few hours now, and it seems to fix the bug. No GPU lockups or hangs, no error messages in dmesg.

Revision history for this message

si14 (a2alt) wrote on 2011-06-17:

#40

It looks like #38 fixes the bug. But I have some errors in dmesg (they of course can be unrelated to #38):

[ 1406.247369] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro,data=ordered,discard,commit=0
[ 1406.252920] EXT4-fs (sda1): re-mounted. Opts: data=ordered,discard,commit=0
[ 1406.260472] EXT4-fs (sda5): re-mounted. Opts: user_xattr,data=ordered,discard,commit=0
[ 1406.273500] thinkpad_acpi: THERMAL ALERT: unknown thermal alarm received
[ 1406.273509] thinkpad_acpi: unhandled HKEY event 0x6040
[ 1406.273514] thinkpad_acpi: please report the conditions when this event happened to <email address hidden>
[ 1406.274433] thinkpad_acpi: EC reports that Thermal Table has changed

Revision history for this message

vilmos (vilmos) wrote on 2011-06-17:

#41

@si14: that's an unrelated issue and harmless. Also happens with the latest stable natty kernel.

Revision history for this message

Steffen Rusitschka (rusi) wrote on 2011-06-18:

#42

#38 fixes it for me, too (semaphores=0). Thanks!

Bug Watch Updater (bug-watch-updater) on 2011-06-19

Changed in linux:
status:	Confirmed → Fix Released

Revision history for this message

Jonathan Davies (jpds) wrote on 2011-06-20:

#43

The proposed patched kernel has been working fine for me for two days.

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-06-21:

#44

This has been released in 3.0-rc4

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=498e720b96379d8ee9c294950a01534a73defcf3

Robert Hooker (sarvatt) on 2011-06-21

description:

updated

Tim Gardner (timg-tpi) on 2011-06-22

Changed in xserver-xorg-video-intel (Ubuntu Natty):
status:	New → Invalid
Changed in linux (Ubuntu Natty):
assignee:	nobody → Robert Hooker (sarvatt)
status:	New → Fix Committed

Revision history for this message

vilmos (vilmos) wrote on 2011-06-25:

#45

After using the kernel from #38 for a few days I got a GPU hang again:

Jun 25 10:58:05 x220 kernel: [154505.782665] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 25 10:58:05 x220 kernel: [154505.782788] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 79090 at 79089, next 79117)

This, however:

https://bugzilla.redhat.com/show_bug.cgi?id=684097

indicates that this needs to be applied for the BSD ring too (see second part of the fix).

@Robert, could you create a kernel packages with both patches from the Red Hat bugzilla page applied? Thanks.

Revision history for this message

Julian Wiedmann (jwiedmann) wrote on 2011-06-25:

#46

The patch for the BSD ring is in Linus' tree now:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ec6a890dfed7dd245beba5e5bcdfcffbd934c284

Revision history for this message

David Sterratt (david-c-sterratt) wrote on 2011-06-29:

#47

The kernel at #38 works for me too. Thank-you!

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-06-29:

#48

vilmos: Do you even have i965-va-driver installed? if not it wont be any help, the BSD ring is used for accelerated h264 acceleration through libva

Revision history for this message

vilmos (vilmos) wrote on 2011-07-01:

#49

@Robert I don't know if any of my installed applications use hw h264 acceleration, but once in a while (every few days) I still experience GPU hangs even with the kernel from #38. The situation improved a LOT though with it.

Revision history for this message

Uncle Pedro (peter.a.h.peterson) wrote on 2011-07-02:

#50

I can confirm that the fix in #38 works on my X220T. I haven't updated the bios or made any other graphical changes.

Revision history for this message

B.B. Lauret (bblauret) wrote on 2011-07-08:

#51

What does 'applied' mean in this message? https://lists.ubuntu.com/archives/kernel-team/2011-June/015962.html

There isn't a new kernel in natty-updates, and in natty-proposed this bug isn't mentioned (http://kernel.ubuntu.com/~kernel-ppa/reports/sru-report.html)...

Am I misunderstanding something?

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-07-12:

#52

Same error with HP DV6 6030el i72630 and AMD Radeon HD 6470M. I can not use radeon driver because Natty freeze during the boot so I have to use i915. But sometimes and randomly I receive freeze also with this driver. Keyboard and mouse are locked, alt+stamp REISUB doesn't work. Looking the kernel log some minute before the freeze I can find this error [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed. Now I'm using the solution in #18 and seems to work fine. I can understand why this bug is classified "medium" .....freezes like these impose forced shutdowns with high risks of damage to the hardware. I was near to use again windows just only to safeguard my purchase. Please fix it!!! Thanks

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-07-12:

#53

Ps: The freezes occur using Unity.

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-07-12:

#54

bblauret: it means its applied to the pending -proposed kernel git tree. it will be in the 2.6.38-11 kernel whenever that release is started, should hopefully be in -proposed when 2.6.38-10 migrates to -updates next week.

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-07-12:

#55

I tried also the new kernel 2.6.39 of kernel ppa repository but same bug occurs.

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-07-12:

#56

closing oneiric task, this was fixed in 3.0-rc4 released there some time ago

Changed in linux (Ubuntu):
status:	Triaged → Fix Released

Chris Van Hoof (vanhoof) on 2011-07-12

Changed in linux (Ubuntu Natty):
importance:	Undecided → Medium
Changed in xserver-xorg-video-intel (Ubuntu):
importance:	Medium → Undecided

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-07-13:

#57

I received again a total freeze yestarday but I don't know if it is linked to this bug. The freeze occurs letting run the cosmo screensaver. Can you try if it is the same for you, please?

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-07-13:

#58

Sorry Robert Hooker I didn't read your message. Is there some patch for natty or I have to wait oneiric?

Revision history for this message

B.B. Lauret (bblauret) wrote on 2011-07-13:

#59

Robert Hooker: Thanks for your explanation, now that 2.6.38-10 has hit -updates, I see forward to the following kernel in -proposed.

Revision history for this message

Matthias Schmidt (mschmidt) wrote on 2011-07-16:

#60

I see the strange "bsd ring behavior" when using the fix of #18 with my Thinkpad X220. Added the semaphore option to grub renders my system unusable. Console is no longer high-res and I only see graphics errors with X, no chance to work. I get the following error:

[drm:init_ring_common] *ERROR* gen6 bsd ring initialization failed ctl 0001f003 head 00000000 tail 00000000 start 00022000
[drm:i915_driver_load] *ERROR* failed to init modeset

If I boot the system w/o the grub parameter and set it afterwards (via /sys), it seems to work.

So take care if you put the fix into the grub config.

Revision history for this message

Maarten Kossen (mpkossen) wrote on 2011-07-18:

#61

I've experienced this bug as well on a Clevo W150HRM with a Sandy Bridge Core I7 (2820QM). The solution in #16 fixes it for me. I have yet to try #18, but I'm sure it'll work as well. Running 2.6.38-10-generic.

Revision history for this message

Matthias Schmidt (mschmidt) wrote on 2011-07-18:

#62

The fix in #18 solves the "micro-hangs" with the Unity launcher and X in general for me, but the system still freezes after some uptime. It is completely locked up and only a hard reboot "solves" the freeze. Is anybody also seeing this?

Revision history for this message

Stephen Rees-Carter (valorin) wrote on 2011-07-19:

#63

The fix in #18 solves micro-freeze problems with OpenGL (i.e. games), but it does not solve them for normal usage.
For example, it has locked up 3 times during the writing of this message.

Also note, I have 'noapic' in my GRUB config as well, as that is the only way I can get my laptop to successfully boot.
The full line is:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash noapic i915.semaphores=1"

Bug #806434 has the full specs of my laptop for those who are interested.

Revision history for this message

Herton R. Krzesinski (herton) wrote on 2011-07-19:

#64

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-natty

Revision history for this message

Anna Glasgall (aglasgall) wrote on 2011-07-19:

#65

2.6.38-11 in -proposed seems to fix the problem for me on my ThinkPad x220.

Revision history for this message

Matthias Schmidt (mschmidt) wrote on 2011-07-19:

#66

Can confirm this. 2.6.38-11 absolutely fixes the hangs for me. Not sure about the freezes, as this needs some time. Nevertheless, the fix should go in.

tags:

added: verification-done-natty
removed: verification-needed-natty

Revision history for this message

Andrew Lukoshko (andrew.lukoshko) wrote on 2011-07-19:

#67

Confirm. Lenovo Thinkpad EDGE E420, 2.6.38-11 fixes problem and Unity now works much more comfortable.

Revision history for this message

Stephen Rees-Carter (valorin) wrote on 2011-07-20:

#68

Confirmed, it fixes the semaphore problem on my ThinkPad L520 :)

It doesn't remove the need for noapic though, but that is likely a different issue.

Revision history for this message

Maarten Kossen (mpkossen) wrote on 2011-07-20:

#69

#18 doesn't fix it on 2.6.38-10, #16 does.

Revision history for this message

B.B. Lauret (bblauret) wrote on 2011-07-20:

#70

Confirmed, 2.6.38-11 fixes my issues.

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-07-21:

#71

Confirmed! This bug (blt ring idle) seems to be fixed on 2.6.38-11. There are still issues and other GPU hungs though - after 2 seconds of playing sauerbraten in full-screen:

[ 54.562391] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 54.563417] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 36628 at 36618, next 36629)

Don't know if it's related to this bug though. Seems to be related to xserver-xorg-intel. The whole sandy bridge GPU issues are really annoying. And this nearly 8 months after first sandy bridge release ...

Revision history for this message

exactt (giesbert) wrote on 2011-07-21:

#72

Download full text (3.6 KiB)

@felix

Also got errors after a while with the latest proposed kernel. I think we should file a new bug report. Would you mind doing that?

[21589.558177] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21589.559706] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353483 at 13353481, next 13353484)
[21595.961958] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21595.962008] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353491 at 13353488, next 13353492)
[21602.395716] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21602.395773] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353502 at 13353492, next 13353503)
[21608.789507] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21608.789568] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353511 at 13353492, next 13353512)
[21615.183307] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21615.183367] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353522 at 13353492, next 13353523)
[21617.091445] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21617.091482] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353525 at 13353492, next 13353535)
[21617.091710] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[21617.091714] [drm:i915_reset] *ERROR* Failed to reset chip.
[21617.091799] compiz[1548]: segfault at 0 ip 00007f7a5b4f4be8 sp 00007fff7a7df7e0 error 6 in i965_dri.so[7f7a5b483000+ac000]
[21620.669148] compiz[6719]: segfault at 0 ip 00007f3fefdd1acc sp 00007fff7cab5e00 error 6 in i965_dri.so[7f3fefdb2000+ac000]
[21624.072709] compiz[6732]: segfault at 0 ip 00007fe1b4a73acc sp 00007fffa1461b00 error 6 in i965_dri.so[7fe1b4a54000+ac000]
[21627.516388] compiz[6735]: segfault at 0 ip 00007f9dcf6deacc sp 00007fffb462c580 error 6 in i965_dri.so[7f9dcf6bf000+ac000]
[21630.932591] compiz[6744]: segfault at 0 ip 00007fc266a55acc sp 00007fffb08f1f40 error 6 in i965_dri.so[7fc266a36000+ac000]
[21634.313294] compiz[6747]: segfault at 0 ip 00007f4ae8c08acc sp 00007fff856b7910 error 6 in i965_dri.so[7f4ae8be9000+ac000]
[21637.743397] compiz[6749]: segfault at 0 ip 00007f7bdb4a2acc sp 00007fff870d1b30 error 6 in i965_dri.so[7f7bdb483000+ac000]
[21641.178166] compiz[6751]: segfault at 0 ip 00007f4e534a2acc sp 00007fffd2903860 error 6 in i965_dri.so[7f4e53483000+ac000]
[21644.626419] compiz[6753]: segfault at 0 ip 00007f1b16c61acc sp 00007fff6fa3d4e0 error 6 in i965_dri.so[7f1b16c42000+ac000]
[21648.052081] compiz[6755]: segfault at 0 ip 00007f0566a55acc sp 00007fffc7c3e4b0 error 6 in i965_dri.so[7f0566a36000+ac000]
[21651.557582] compiz[6758]: segfault at 0 ip 00007f3a012f2acc sp 00007fffbe4fa580 error 6 in i965_dri.so[7f3a012d3000+ac000]
[21654.987556] compiz[6760]: segfault at 0 ip 00007ffc98681acc sp 00007fff3cbcbd60 error 6 in i965_dri.so[7ffc98662000+ac000]
[21658.410040] compiz[6762]: segfault at 0 ip 00007f60b431cacc sp 00007fff046c12c0 error 6 in i965...

@felix

Also got errors after a while with the latest proposed kernel. I think we should file a new bug report. Would you mind doing that?

[21589.558177] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21589.559706] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353483 at 13353481, next 13353484)
[21595.961958] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21595.962008] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353491 at 13353488, next 13353492)
[21602.395716] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21602.395773] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353502 at 13353492, next 13353503)
[21608.789507] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21608.789568] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353511 at 13353492, next 13353512)
[21615.183307] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21615.183367] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353522 at 13353492, next 13353523)
[21617.091445] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21617.091482] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353525 at 13353492, next 13353535)
[21617.091710] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[21617.091714] [drm:i915_reset] *ERROR* Failed to reset chip.
[21617.091799] compiz[1548]: segfault at 0 ip 00007f7a5b4f4be8 sp 00007fff7a7df7e0 error 6 in i965_dri.so[7f7a5b483000+ac000]
[21620.669148] compiz[6719]: segfault at 0 ip 00007f3fefdd1acc sp 00007fff7cab5e00 error 6 in i965_dri.so[7f3fefdb2000+ac000]
[21624.072709] compiz[6732]: segfault at 0 ip 00007fe1b4a73acc sp 00007fffa1461b00 error 6 in i965_dri.so[7fe1b4a54000+ac000]
[21627.516388] compiz[6735]: segfault at 0 ip 00007f9dcf6deacc sp 00007fffb462c580 error 6 in i965_dri.so[7f9dcf6bf000+ac000]
[21630.932591] compiz[6744]: segfault at 0 ip 00007fc266a55acc sp 00007fffb08f1f40 error 6 in i965_dri.so[7fc266a36000+ac000]
[21634.313294] compiz[6747]: segfault at 0 ip 00007f4ae8c08acc sp 00007fff856b7910 error 6 in i965_dri.so[7f4ae8be9000+ac000]
[21637.743397] compiz[6749]: segfault at 0 ip 00007f7bdb4a2acc sp 00007fff870d1b30 error 6 in i965_dri.so[7f7bdb483000+ac000]
[21641.178166] compiz[6751]: segfault at 0 ip 00007f4e534a2acc sp 00007fffd2903860 error 6 in i965_dri.so[7f4e53483000+ac000]
[21644.626419] compiz[6753]: segfault at 0 ip 00007f1b16c61acc sp 00007fff6fa3d4e0 error 6 in i965_dri.so[7f1b16c42000+ac000]
[21648.052081] compiz[6755]: segfault at 0 ip 00007f0566a55acc sp 00007fffc7c3e4b0 error 6 in i965_dri.so[7f0566a36000+ac000]
[21651.557582] compiz[6758]: segfault at 0 ip 00007f3a012f2acc sp 00007fffbe4fa580 error 6 in i965_dri.so[7f3a012d3000+ac000]
[21654.987556] compiz[6760]: segfault at 0 ip 00007ffc98681acc sp 00007fff3cbcbd60 error 6 in i965_dri.so[7ffc98662000+ac000]
[21658.410040] compiz[6762]: segfault at 0 ip 00007f60b431cacc sp 00007fff046c12c0 error 6 in i965_dri.so[7f60b42fd000+ac000]
[21661.837253] compiz[6764]: segfault at 0 ip 00007f70ff07eacc sp 00007fffc9066650 error 6 in i965_dri.so[7f70ff05f000+ac000]
[21665.458002] compiz[6767]: segfault at 0 ip 00007ff70ec61acc sp 00007fff2986fba0 error 6 in i965_dri.so[7ff70ec42000+ac000]
[21668.958303] compiz[6769]: segfault at 0 ip 00007f7b1205dacc sp 00007fffa944d240 error 6 in i965_dri.so[7f7b1203e000+ac000]
[21672.369707] compiz[6786]: segfault at 0 ip 00007f9aba97facc sp 00007fff1ae885e0 error 6 in i965_dri.so[7f9aba960000+ac000]

Revision history for this message

exactt (giesbert) wrote on 2011-07-21:

#73

@felix @all: already found a bug which sounds just like the problem: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/805586

Revision history for this message

Felix Engemann (felix-engemann) wrote on 2011-07-22:

#74

@exactt (giesbert)

yes sounds like this bug - although i do not experience segfaults with compiz.

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-07-26:

#75

No error found with new kernel 2.6.38.11. I also updated xserver-xorg-video-intel from the proposed repo. Actually no freeze...I hope all has been fixed! Thanks!!!!

Revision history for this message

Matthias Schmidt (mschmidt) wrote on 2011-07-28:

#76

Just for the record: After one week of extensive testing, -11 also fixes the regular freezes. The system runs rock stable now!

Revision history for this message

Stephen Rees-Carter (valorin) wrote on 2011-07-29:

#77

After a week of testing, I'm still experiencing occasional system-wide freezes as well as frequent mini-freezes. I guess my hardware is different...

Revision history for this message

Steffen Rusitschka (rusi) wrote on 2011-07-29:

#78

-11 fixes the short hangs (no more dmesg alerts) but my system still freezes completely (reboot required) once or twice a day. Dell Latitude 5520.

Revision history for this message

cuc (cuc+) wrote on 2011-08-03:

#79

-11 fixes the short hangs, though i often have even heavier lag after standby or hibernate...
samsung rf511

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-08-03:

#80

I can confirm....the total freezes still exist. :-( No error message in kernel log. I'm destroying my new notebook forcing the shutdown. Do you need some log to find the solution. I available for testing.

HP DV6 6030el i7 630

Revision history for this message

Robert Hooker (sarvatt) wrote on 2011-08-03:

#81

This bug was never about complete system hangs, just stuttering caused when using GL. Please do file new bugs if you are having hangs.

Revision history for this message

Luigi R. (xluigi84) wrote on 2011-08-03:

#82

It usually happens when I used program like tecplot or Fluent post-processing tool. I suppose it is releated to GL issue.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-08-19:

#83

Download full text (13.4 KiB)

This bug was fixed in the package linux - 2.6.38-11.48

---------------
linux (2.6.38-11.48) natty-proposed; urgency=low

[Herton R. Krzesinski]

* Release Tracking Bug
- LP: #818175

[ Upstream Kernel Changes ]

  * Revert "HID: magicmouse: ignore 'ivalid report id' while switching
    modes"
    - LP: #814250

linux (2.6.38-11.47) natty-proposed; urgency=low

[Steve Conklin]

* Release Tracking Bug
- LP: #811180

[ Keng-Yu Lin ]

  * SAUCE: Revert: "dell-laptop: Toggle the unsupported hardware
    killswitch"
    - LP: #775281

[ Ming Lei ]

* SAUCE: fix yama_ptracer_del lockdep warning
- LP: #791019

[ Stefan Bader ]

* SAUCE: Re-enable RODATA for i386 virtual
- LP: #809838

[ Tim Gardner ]

  * [Config] Add grub-efi as a recommended bootloader for server and
    generic
    - LP: #800910
  * SAUCE: rtl8192se: Force a build for a 2.6/3.0 kernel
    - LP: #805494

[ Upstream Kernel Changes ]

  * Revert "bridge: Forward reserved group addresses if !STP"
    - LP: #793702
  * Fix up ABI directory
  * bonding: Incorrect TX queue offset, CVE-2011-1581
    - LP: #792312
    - CVE-2011-1581
  * fs/partitions/efi.c: corrupted GUID partition tables can cause kernel
    oops
    - LP: #795418
    - CVE-2011-1577
  * usbnet/cdc_ncm: add missing .reset_resume hook
    - LP: #793892
  * ath5k: Disable fast channel switching by default
    - LP: #767192
  * mm: vmscan: correctly check if reclaimer should schedule during
    shrink_slab
    - LP: #755066
  * mm: vmscan: correct use of pgdat_balanced in sleeping_prematurely
    - LP: #755066
  * ALSA: hda - Use LPIB for ATI/AMD chipsets as default
    - LP: #741825
  * ALSA: hda - Enable snoop bit for AMD controllers
    - LP: #741825
  * ALSA: hda - Enable sync_write workaround for AMD generically
    - LP: #741825
  * cpuidle: menu: fixed wrapping timers at 4.294 seconds
    - LP: #774947
  * drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
    - LP: #761065
  * USB: ehci: remove structure packing from ehci_def
    - LP: #791552
  * drm/i915: disable PCH ports if needed when disabling a CRTC
    - LP: #791752
  * kmemleak: Do not return a pointer to an object that kmemleak did not
    get
    - LP: #793702
  * kmemleak: Initialise kmemleak after debug_objects_mem_init()
    - LP: #793702
  * Fix _OSC UUID in pcc-cpufreq
    - LP: #793702
  * CPU hotplug, re-create sysfs directory and symlinks
    - LP: #793702
  * Fix memory leak in cpufreq_stat
    - LP: #793702
  * net: recvmmsg: Strip MSG_WAITFORONE when calling recvmsg
    - LP: #793702
  * ftrace: Only update the function code on write to filter files
    - LP: #793702
  * qla2xxx: Fix hang during driver unload when vport is active.
    - LP: #793702
  * qla2xxx: Fix virtual port failing to login after chip reset.
    - LP: #793702
  * qla2xxx: Fix vport delete hang when logins are outstanding.
    - LP: #793702
  * powerpc/kdump64: Don't reference freed memory as pacas
    - LP: #793702
  * powerpc/kexec: Fix memory corruption from unallocated slaves
    - LP: #793702
  * x86, cpufeature: Fix cpuid leaf 7 feature detection
    - LP: #793702
  * ath9k_hw: do noise floor calibration only on required chain...

This bug was fixed in the package linux - 2.6.38-11.48

---------------
linux (2.6.38-11.48) natty-proposed; urgency=low

[Herton R. Krzesinski]

* Release Tracking Bug
    - LP: #818175

[ Upstream Kernel Changes ]

* Revert "HID: magicmouse: ignore 'ivalid report id' while switching
    modes"
    - LP: #814250

linux (2.6.38-11.47) natty-proposed; urgency=low

[Steve Conklin]

* Release Tracking Bug
    - LP: #811180

[ Keng-Yu Lin ]

* SAUCE: Revert: "dell-laptop: Toggle the unsupported hardware
    killswitch"
    - LP: #775281

[ Ming Lei ]

* SAUCE: fix yama_ptracer_del lockdep warning
    - LP: #791019

[ Stefan Bader ]

* SAUCE: Re-enable RODATA for i386 virtual
    - LP: #809838

[ Tim Gardner ]

* [Config] Add grub-efi as a recommended bootloader for server and
    generic
    - LP: #800910
  * SAUCE: rtl8192se: Force a build for a 2.6/3.0 kernel
    - LP: #805494

[ Upstream Kernel Changes ]

* Revert "bridge: Forward reserved group addresses if !STP"
    - LP: #793702
  * Fix up ABI directory
  * bonding: Incorrect TX queue offset, CVE-2011-1581
    - LP: #792312
    - CVE-2011-1581
  * fs/partitions/efi.c: corrupted GUID partition tables can cause kernel
    oops
    - LP: #795418
    - CVE-2011-1577
  * usbnet/cdc_ncm: add missing .reset_resume hook
    - LP: #793892
  * ath5k: Disable fast channel switching by default
    - LP: #767192
  * mm: vmscan: correctly check if reclaimer should schedule during
    shrink_slab
    - LP: #755066
  * mm: vmscan: correct use of pgdat_balanced in sleeping_prematurely
    - LP: #755066
  * ALSA: hda - Use LPIB for ATI/AMD chipsets as default
    - LP: #741825
  * ALSA: hda - Enable snoop bit for AMD controllers
    - LP: #741825
  * ALSA: hda - Enable sync_write workaround for AMD generically
    - LP: #741825
  * cpuidle: menu: fixed wrapping timers at 4.294 seconds
    - LP: #774947
  * drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
    - LP: #761065
  * USB: ehci: remove structure packing from ehci_def
    - LP: #791552
  * drm/i915: disable PCH ports if needed when disabling a CRTC
    - LP: #791752
  * kmemleak: Do not return a pointer to an object that kmemleak did not
    get
    - LP: #793702
  * kmemleak: Initialise kmemleak after debug_objects_mem_init()
    - LP: #793702
  * Fix _OSC UUID in pcc-cpufreq
    - LP: #793702
  * CPU hotplug, re-create sysfs directory and symlinks
    - LP: #793702
  * Fix memory leak in cpufreq_stat
    - LP: #793702
  * net: recvmmsg: Strip MSG_WAITFORONE when calling recvmsg
    - LP: #793702
  * ftrace: Only update the function code on write to filter files
    - LP: #793702
  * qla2xxx: Fix hang during driver unload when vport is active.
    - LP: #793702
  * qla2xxx: Fix virtual port failing to login after chip reset.
    - LP: #793702
  * qla2xxx: Fix vport delete hang when logins are outstanding.
    - LP: #793702
  * powerpc/kdump64: Don't reference freed memory as pacas
    - LP: #793702
  * powerpc/kexec: Fix memory corruption from unallocated slaves
    - LP: #793702
  * x86, cpufeature: Fix cpuid leaf 7 feature detection
    - LP: #793702
  * ath9k_hw: do noise floor calibration only on required chains
    - LP: #793702
  * ath9k_hw: fix power for the HT40 duplicate frames
    - LP: #793702
  * ath9k_hw: fix dual band assumption for XB113
    - LP: #793702
  * ath9k_hw: Fix STA connection issues with AR9380 (XB113).
    - LP: #793702
  * powerpc: Set nr_cpu_ids early and use it to free PACAs
    - LP: #793702
  * powerpc/oprofile: Handle events that raise an exception without
    overflowing
    - LP: #793702
  * iwlagn: fix iwl_is_any_associated
    - LP: #793702
  * block: rescan partitions on invalidated devices on -ENOMEDIA too
    - LP: #793702
  * block: move bd_set_size() above rescan_partitions() in __blkdev_get()
    - LP: #793702
  * paride: Convert to bdops->check_events()
    - LP: #793702
  * gdrom,viocd: Convert to bdops->check_events()
    - LP: #793702
  * ide: Convert to bdops->check_events()
    - LP: #793702
  * block: don't block events on excl write for non-optical devices
    - LP: #793702
  * block: Fix discard topology stacking and reporting
    - LP: #793702
  * block: add proper state guards to __elv_next_request
    - LP: #793702
  * block: always allocate genhd->ev if check_events is implemented
    - LP: #793702
  * mtd: mtdconcat: fix NAND OOB write
    - LP: #793702
  * mtd: return badblockbits back
    - LP: #793702
  * x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address
    limit
    - LP: #793702
  * ext4: fix possible use-after-free in ext4_remove_li_request()
    - LP: #793702
  * iwlwifi: fix bugs in change_interface
    - LP: #793702
  * nl80211: Fix set_key regression with some drivers
    - LP: #793702
  * mac80211: fix a few RCU issues
    - LP: #793702
  * wire up fanotify syscalls
    - LP: #793702
  * wire up clock_adjtime syscall
    - LP: #793702
  * drm: Send pending vblank events before disabling vblank.
    - LP: #793702
  * pata_cm64x: fix boot crash on parisc
    - LP: #793702
  * ext3: Fix fs corruption when make_indexed_dir() fails
    - LP: #793702
  * jbd: Fix forever sleeping process in do_get_write_access()
    - LP: #793702
  * jbd: fix fsync() tid wraparound bug
    - LP: #793702
  * ext4: release page cache in ext4_mb_load_buddy error path
    - LP: #793702
  * bonding: 802.3ad - fix agg_device_up
    - LP: #793702
  * bridge: fix forwarding of IPv6
    - LP: #793702
  * ieee802154: Remove hacked CFLAGS in net/ieee802154/Makefile
    - LP: #793702
  * irda: fix locking unbalance in irda_sendmsg
    - LP: #793702
  * inetpeer: reduce stack usage
    - LP: #793702
  * ipv6: Remove hoplimit initialization to -1
    - LP: #793702
  * ipv6: udp: fix the wrong headroom check
    - LP: #793702
  * macvlan: fix panic if lowerdev in a bond
    - LP: #793702
  * net: Do not wrap sysctl igmp_max_memberships in IP_MULTICAST
    - LP: #793702
  * net: use hlist_del_rcu() in dev_change_name()
    - LP: #793702
  * SCTP: fix race between sctp_bind_addr_free() and
    sctp_bind_addr_conflict()
    - LP: #793702
  * tcp: len check is unnecessarily devastating, change to WARN_ON
    - LP: #793702
  * vlan: fix GVRP at dismantle time MIME-Version: 1.0
    - LP: #793702
  * igmp: call ip_mc_clear_src() only when we have no users of ip_mc_list
    - LP: #793702
  * net: add skb_dst_force() in sock_queue_err_skb()
    - LP: #793702
  * sch_sfq: avoid giving spurious NET_XMIT_CN signals
    - LP: #793702
  * sctp: fix memory leak of the ASCONF queue when free asoc
    - LP: #793702
  * sch_sfq: fix peek() implementation
    - LP: #793702
  * bonding: prevent deadlock on slave store with alb mode (v3)
    - LP: #793702
  * mpt2sas: move even handling of MPT2SAS_TURN_ON_FAULT_LED into process
    context
    - LP: #793702
  * bnx2i: Fixed packet error created when the sq_size is set to 16
    - LP: #793702
  * bnx2i: Updated the connection shutdown/cleanup timeout
    - LP: #793702
  * Fix Ultrastor asm snippet
    - LP: #793702
  * target: Fix multi task->task_sg[] chaining logic bug
    - LP: #793702
  * target: Fix interrupt context bug with stats_lock and
    core_tmr_alloc_req
    - LP: #793702
  * target: Fix bug with task_sg chained transport_free_dev_tasks release
    - LP: #793702
  * target: Fix task->task_execute_queue=1 clear bug + LUN_RESET OOPs
    - LP: #793702
  * x86, ioapic: Fix potential resume deadlock
    - LP: #793702
  * x86, amd: Do not enable ARAT feature on AMD processors below family
    0x12
    - LP: #793702
  * x86, amd: Use _safe() msr access for GartTlbWlk disable code
    - LP: #793702
  * x86, cpufeature: Update CPU feature RDRND to RDRAND
    - LP: #793702
  * oprofile, x86: Enable preemption during pci device setup in IBS init
    - LP: #793702
  * rcu: Fix unpaired rcu_irq_enter() from locking selftests
    - LP: #793702
  * When mandatory encryption on share, fail mount
    - LP: #793702
  * staging: usbip: fix wrong endian conversion
    - LP: #793702
  * staging: r8712u: Fix driver to support ad-hoc mode
    - LP: #793702
  * Fix for buffer overflow in ldm_frag_add not sufficient
    - LP: #793702
  * seqlock: Don't smp_rmb in seqlock reader spin loop
    - LP: #793702
  * md: Fix race when creating a new md device.
    - LP: #793702
  * md/bitmap: fix saving of events_cleared and other state.
    - LP: #793702
  * ALSA: HDA: Use one dmic only for Dell Studio 1558
    - LP: #731706, #793702
  * ALSA: HDA: Add quirk for Lenovo U350
    - LP: #751681, #793702
  * ALSA: hda - Fix input-src parse in patch_analog.c
    - LP: #793702
  * ASoC: Ensure output PGA is enabled for line outputs in wm_hubs
    - LP: #793702
  * ASoC: Add some missing volume update bit sets for wm_hubs devices
    - LP: #793702
  * HID: magicmouse: ignore 'ivalid report id' while switching modes
    - LP: #793702
  * mm/page_alloc.c: prevent unending loop in __alloc_pages_slowpath()
    - LP: #793702
  * loop: limit 'max_part' module param to DISK_MAX_PARTS
    - LP: #793702
  * loop: handle on-demand devices correctly
    - LP: #793702
  * i2c/writing-clients: Fix foo_driver.id_table
    - LP: #793702
  * USB: CP210x Add 4 Device IDs for AC-Services Devices
    - LP: #793702
  * USB: moto_modem: Add USB identifier for the Motorola VE240.
    - LP: #793702
  * USB: serial: ftdi_sio: adding support for TavIR STK500
    - LP: #793702
  * USB: gadget: g_multi: fixed vendor and product ID in inf files
    - LP: #793702
  * USB: gamin_gps: Fix for data transfer problems in native mode
    - LP: #793702
  * Bind only modem AT command endpoint to option module.
    - LP: #793702
  * USB: cdc_acm: Fix oops when Droids MuIn LCD is connected
    - LP: #793702
  * xhci: Fix bug in control transfer cancellation.
    - LP: #793702
  * usb/gadget: at91sam9g20 fix end point max packet size
    - LP: #793702
  * usb: gadget: rndis: don't test against req->length
    - LP: #793702
  * xhci: Fix memory leak in ring cache deallocation.
    - LP: #793702
  * xhci: Fix memory leak bug when dropping endpoints
    - LP: #793702
  * USB: option: add support for Huawei E353 device
    - LP: #793702
  * OHCI: fix regression caused by nVidia shutdown workaround
    - LP: #793702
  * USB: remove remaining usages of hcd->state from usbcore and fix
    regression
    - LP: #793702
  * cx88: protect per-device driver list with device lock
    - LP: #793702
  * cx88: fix locking of sub-driver operations
    - LP: #793702
  * cx88: hold device lock during sub-driver initialization
    - LP: #793702
  * sh: clkfwk: fixup clk_rate_table_build parameter in div6 clock
    - LP: #793702
  * sh: fixup fpu.o compile order
    - LP: #793702
  * p54usb: add zoom 4410 usbid
    - LP: #793702
  * eCryptfs: Allow 2 scatterlist entries for encrypted filenames
    - LP: #793702
  * UBIFS: fix a rare memory leak in ro to rw remounting path
    - LP: #793702
  * kbuild: Fix GNU make v3.80 compatibility
    - LP: #793702
  * i8k: Avoid lahf in 64-bit code
    - LP: #793702
  * idle governor: Avoid lock acquisition to read pm_qos before entering
    idle
    - LP: #793702
  * dm table: reject devices without request fns
    - LP: #793702
  * ARM: 6941/1: cache: ensure MVA is cacheline aligned in
    flush_kern_dcache_area
    - LP: #793702
  * tmpfs: fix race between truncate and writepage
    - LP: #793702
  * atm: expose ATM device index in sysfs
    - LP: #793702
  * brd: limit 'max_part' module param to DISK_MAX_PARTS
    - LP: #793702
  * brd: handle on-demand devices correctly
    - LP: #793702
  * drm/i915: fix user irq miss in BSD ring on g4x
    - LP: #793702
  * drm/radeon/evergreen/btc/fusion: setup hdp to invalidate and flush when
    asked
    - LP: #793702
  * drm/radeon/kms: add wait idle ioctl for eg->cayman
    - LP: #793702
  * SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change
    callback...
    - LP: #793702
  * NFSv4: Handle expired stateids when the lease is still valid
    - LP: #793702
  * NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
    - LP: #793702
  * PCI: Add quirk for setting valid class for TI816X Endpoint
    - LP: #793702
  * xen mmu: fix a race window causing leave_mm BUG()
    - LP: #793702
  * ext4: Use schedule_timeout_interruptible() for waiting in lazyinit
    thread
    - LP: #793702
  * AppArmor: fix oops in apparmor_setprocattr
    - LP: #793702
  * Linux 2.6.38.8
    - LP: #793702
  * xhci: Add defines for hardcoded slot states
    - LP: #802541
  * xhci: Do not issue device reset when device is not setup
    - LP: #802541
  * taskstats: don't allow duplicate entries in listener mode,
    CVE-2011-2484
    - LP: #806390
    - CVE-2011-2484
  * ext4: init timer earlier to avoid a kernel panic in __save_error_info,
    CVE-2011-2493
    - LP: #806929
    - CVE-2011-2493
  * acer-wmi: does not poll device status when WMI event is available
    - LP: #771758
  * acer-wmi: Only update rfkill status for associated hotkey events
    - LP: #771758
  * (drop after 2.6.38) acer-wmi: Add support for Aspire 1830 wlan hotkey
    - LP: #771758
  * mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely
    - LP: #808509
  * mm: vmscan: kswapd should not free an excessive number of pages when
    balancing small zones
    - LP: #808509
  * mm: vmscan: do not apply pressure to slab if we are not applying
    pressure to zone
    - LP: #808509
  * mm: vmscan: evaluate the watermarks against the correct classzone
    - LP: #808509
  * mm: vmscan: only read new_classzone_idx from pgdat when reclaiming
    successfully
    - LP: #808509
 -- Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>   Fri, 29 Jul 2011 14:50:19 -0300

Changed in linux (Ubuntu Natty):
status:	Fix Committed → Fix Released

Revision history for this message

Rob van der Linde (robvdl) wrote on 2012-07-05:

#84

Why does it say this bug is fixed for natty, when it still is happening? I have fully upgraded my system (to the 2.6.38-15 kernel) and it's still happening.

I have tried a lot of workarounds mentioned here, none work, I still get GPU hangs.

Revision history for this message

V. A. (nyappy) wrote on 2012-08-12:

#85

Still happens in Precise(12.04).

piotr zimoch (ebytyes) on 2013-05-22

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Invalid → New
status:	New → Incomplete
status:	Incomplete → Opinion
status:	Opinion → Invalid
status:	Invalid → Confirmed
status:	Confirmed → In Progress
status:	In Progress → Fix Committed
status:	Fix Committed → Fix Released

Revision history for this message

Arie Skliarouk (skliarie) wrote on 2013-06-18:

#86

On Lenovo g570 ubuntu 12.10 worked perfectly.

After upgrade to 13.04 X-Windows started locking up (only the mouse cursor was reacting to mouse movements, but nothing else). The lockup occurs after couple of minutes working in gnome-classic-fallback window manager, no 3D activity was done at the time (at least intentionally). The X lockup is accompanied by messages like these in dmesg (every second or so):

[ 533.963858] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... render ring idle [waiting on 25899, at 25898], missed IRQ?

Revision history for this message

Arie Skliarouk (skliarie) wrote on 2013-06-18:

#87

lspci Edit (1.4 KiB, text/plain)

Revision history for this message

Arie Skliarouk (skliarie) wrote on 2013-06-18:

#88

lshw Edit (15.2 KiB, text/plain)

Ubuntu
xserver-xorg-video-intel package

[Sandybridge] Spurious "ERROR Hangcheck timer elapsed... blt ring idle" messages in dmesg when using compiz

Bug Description

CVE References

Duplicates of this bug

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
Linux	Fix Released	Medium	freedesktop-bugs #36241
linux (Ubuntu)	Fix Released	Medium	Unassigned
Natty	Fix Released	Medium	Robert Hooker
xserver-xorg-video-intel (Ubuntu)	Fix Released	Undecided	Unassigned
Natty	Invalid	Undecided	Unassigned

Ubuntuxserver-xorg-video-intel package

[Sandybridge] Spurious "*ERROR* Hangcheck timer elapsed... blt ring idle" messages in dmesg when using compiz

Bug Description

CVE References

Duplicates of this bug

Other bug subscribers

Related questions

Bug attachments

Remote bug watches

Ubuntu
xserver-xorg-video-intel package

[Sandybridge] Spurious "ERROR Hangcheck timer elapsed... blt ring idle" messages in dmesg when using compiz