[Sandybridge] Spurious "*ERROR* Hangcheck timer elapsed... blt ring idle" messages in dmesg when using compiz

Bug #761065 reported by Robert Hooker on 2011-04-14
418
This bug affects 88 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Medium
Unassigned
Natty
Medium
Robert Hooker
xserver-xorg-video-intel (Ubuntu)
Undecided
Unassigned
Natty
Undecided
Unassigned

Bug Description

SRU Justification:
 Fixes a constant stream of hangcheck errors flooding dmesg, and
removes the visible stuttering that was caused by it when using 3D
applications.
Impact:
 Fixes missed interrupts on sandybridge GPU's. It doesn't affect any
other GPU generation.
Fix:
 Upstream commit 498e720b96379d8ee9c294950a01534a73defcf3.
Testcase:
 1) Install mesa-utils on a system using sandybridge graphics on 11.04
 2) run vblank_mode=0 glxgears and let it run for 30 seconds or so
 3) kill it then check dmesg
 4) Without fix: hangcheck messages every ~5 seconds, massive
stuttering of the whole desktop observed. With fix: no hangcheck
messages, able to continue using the desktop.

This was sent to stable, but 2.6.38.y stable is dead so it will need
to be manually cherry-picked. It has been tested extensively locally
as well as by users on the bug whom I provided test kernels for. It
applies cleanly to ubuntu-natty.git.

Original bug:
-------------
Binary package hint: xserver-xorg-video-intel

When using unity or a Ubuntu classic session with compiz, there are spurious [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 9004, at 9004], missed IRQ? visible in dmesg. The display stops updating for a fraction of a second when it happens. This can be forcibly reproduced by doing vblank_mode=0 glxgears with compiz active and it is specific to sandybridge systems. Disabling sync to vblank in compiz doesn't have any effect, and the problem does not happen when using metacity.

The hangcheck messages are gone in 2.6.39-rc3 kernel, but the root problem still remains in that the display updates erratically when it happens.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-4ubuntu7
ProcVersionSignature: Ubuntu 2.6.38-8.42-generic 2.6.38.2
Uname: Linux 2.6.38-8-generic i686
NonfreeKernelModules: wl
Architecture: i386
CompizPlugins: [core,bailer,detection,composite,opengl,decor,mousepoll,vpswitch,regex,animation,snap,expo,move,compiztoolbox,place,grid,imgpng,gnomecompat,wall,ezoom,workarounds,staticswitcher,resize,fade,unitymtgrabhandles,scale,session,unityshell]
CompositorRunning: compiz
CurrentDmesg:
 [ 20.310859] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 9004, at 9004], missed IRQ?
 [ 32.279371] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 13936, at 13936], missed IRQ?
 [ 141.319196] exe (1718): /proc/1718/oom_adj is deprecated, please use /proc/1718/oom_score_adj instead.
DRM.card0.DP.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.DP.2:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.DP.3:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.HDMI.A.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.HDMI.A.2:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.HDMI.A.3:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1366x768 1366x768
 edid-base64: AP///////wAw5OsCAAAAAAAUAQSQHxF4Cp7lnV9XnCYaUFQAAAABAQEBAQEBAQEBAQEBAQEBWBtWflAADjAkMDUANa4QAAAZPhJWflAADjAkMDUANa4QAAAZAAAA/gBLSjI2MhQxNDBXSDQKAAAAAAAAQTGUAAAAAAEBCiAgAI4=
DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
Date: Thu Apr 14 15:45:03 2011
DistUpgraded: Fresh install
DistroCodename: natty
DistroVariant: ubuntu
DkmsStatus:
 bcmwl, 5.100.82.38+bdcom, 2.6.39-020639rc3-generic, i686: installed
 bcmwl, 5.100.82.38+bdcom, 2.6.38-7-generic, i686: installed
 bcmwl, 5.100.82.38+bdcom, 2.6.38-8-generic, i686: installed
GraphicsCard:
 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:0493]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Beta i386 (20110330)
MachineType: Dell Inc. Latitude E6420
ProcEnviron:
 LANGUAGE=en_US:en
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=4eac9a69-2bfe-4b4b-b469-5e6f7a89e0f1 ro quiet splash vt.handoff=7
Renderer: Unknown
SourcePackage: xserver-xorg-video-intel
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/25/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: X66
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrX66:bd02/25/2011:svnDellInc.:pnLatitudeE6420:pvr01:rvnDellInc.:rn:rvr:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E6420
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.4+bzr20110411-0ubuntu1
version.libdrm2: libdrm2 2.4.23-1ubuntu6
version.libgl1-mesa-dri: libgl1-mesa-dri 7.10.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.10.2-0ubuntu1
version.xserver-xorg: xserver-xorg 1:7.6+4ubuntu3
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.0-0ubuntu4
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-4ubuntu7
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110107+b795ca6e-0ubuntu7

Robert Hooker (sarvatt) wrote :
tags: added: hwe-blocker
Bryce Harrington (bryce) on 2011-04-14
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Triaged
Bryce Harrington (bryce) wrote :

This is almost certainly a kernel issue, but rather than close the X driver task I'll set the priority to Medium, so we can keep track of the issue on the X side.

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Medium
Adilson Oliveira (agoliveira) wrote :

I have the same issue here but I discovered that, if I run a 3D program like celestia, nexuiz or even a 3d screensaver like one with the wireframe ant, the problem vanishes with a -39 kernel. In a discussion with Chris Van Hoof today he tells me he thinks the new kernel just masks the error but, at least to the user's POV it actually solves it for the common applications the users run. I kept nexuiz running for more than 2 hours and haven't noticed a single freeze.
I would like to ask the priority to be raised as this can affect many new systems.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Confirmed
tags: added: kernel-bug
Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Bryce Harrington (bryce) on 2011-04-15
Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Triaged
Bryce Harrington (bryce) wrote :

"I would like to ask the priority to be raised as this can affect many new systems."

@Adilson, no, like I mentioned it's a kernel bug, not X. I was just leaving the X task open (but set to Medium) so that the X team could track the issue with their bugs, give it attention and help towards getting it resolved, even though there is no actual work needed to be done on the X side.

However, your comment that the issue goes away in -39 proves that it really is a kernel issue, and your comment about the priority makes me realize having the X task open is just causing confusion, so I'll just cancel it out. Hopefully you can get some attention on it from the kernel team.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Invalid
Adilson Oliveira (agoliveira) wrote :

@Bryce: Sorry, I didn't realize this bug was against X. It is indeed related to the kernel, that's why I tagged with kernel-bug.

Adilson Oliveira (agoliveira) wrote :

Another thing I forgot to add: if one starts the session in Natty with classic Ubuntu without any effects, even with the current kernel, the problem does not appear.

Felix Engemann (felix-engemann) wrote :

I have exact same bug on my sandybridge machine. i7-2600S on Zotac H67-ITX. Kernel: 2.6.38-8-generic x86_64

Felix Engemann (felix-engemann) wrote :

Isn't this one duplicate to #762855 ?

Felix Engemann (felix-engemann) wrote :

Have also tested this on a ASRock H67M-ITX with i3-2100T and i5-2500T. Exact same symptom. I'm pretty sure now all "Sandy Bridge" internal GPUs are affected, at least with H67 chipset. This looks serious to me as almost all new PCs are sold with "Sandy Bridge". Using development Kernel 2.6.39 isn't a good solution for "Sandy Bridge" owners i guess. Any chance to get the patch from 2.6.39 back in 2.6.38 ? I'm willing to help with testing.

AceLan Kao (acelankao) wrote :

Hi,

This is the new natty kernel I cherry picked 4 patches from .39 kernel.
Please try it and check if this kernel fix the problem, thanks.
http://people.canonical.com/~acelan/bugs/lp753189/

Thx for the new Kernel AceLan!

I tested the x86_64 version on two different Boards. It got a little better I think - bug happens not that frequently imho.
But still, if I e.g. press alt-f2 in Unity and start typing I still get lags and dmesg shows:

[ 85.166147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 39358, at 39358], missed IRQ?
[ 88.283088] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 40776, at 40776], missed IRQ?
[ 109.272320] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 56389, at 56389], missed IRQ?
[ 111.370250] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 56943, at 56943], missed IRQ?

It's better reproducible if you play some opengl game. e.g. neverball.

On my setup (Dell 6420 with integrated HD3000; shows as 8086:0126 (rev 09) in lspci; running Debian 2.6.38-2), the bug appears only after a while (so it's hard to reproduce).

I tried your new kernel anyway. I got the error just once, after the boot:
[ 73.270170] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 11744, at 11744], missed IRQ?
I will stick with this kernel until the bug happens again (or not ;-))
Just tried neverball for a couple of minutes, it didn't trigger the bug.

Note: I don't know if it's relevant, but since I installed this kernel, opening the lid of the laptop will turn the screen to black. The screen remains OK when I close the lid, but when I re-open it, it goes black. It's not off (I still see the backlight), just black. It is then stuck in that state: switching back to textmode won't help. However, if I suspend to RAM and resume, the screen will work again. Before using this kernel, this bug occured exactly 50% of the time (i.e., I could close+open once, and the 2nd time would trigger the bug).

Changed in linux (Ubuntu):
status: New → Confirmed

Seems there is yet another duplicate of this bug: https://bugs.launchpad.net/ubuntu/+bug/753189

Also seems to affect Fedora 14 Kernel 2.6.38:

https://bugzilla.redhat.com/show_bug.cgi?id=684097

@AceLan Kao

I think I found the i915 patch, which would fix this bug (along with a lot of other documented i915 fixes):

from (http://www.kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.39-rc1)

commit 36d527deadf7d0c302e3452dde39465e74a65a08
Author: Chris Wilson <email address hidden>
Date: Sat Mar 19 22:26:49 2011 +0000

    drm/i915: Restore missing command flush before interrupt on BLT ring

    We always skipped flushing the BLT ring if the request flush did not
    include the RENDER domain. However, this neglects that we try to flush
    the COMMAND domain after every batch and before the breadcrumb interrupt
    (to make sure the batch is indeed completed prior to the interrupt
    firing and so insuring CPU coherency). As a result of the missing flush,
    incoherency did indeed creep in, most notable when using lots of command
    buffers and so potentially rewritting an active command buffer (i.e.
    the GPU was still executing from it even though the following interrupt
    had already fired and the request/buffer retired).

    As all ring->flush routines now have the same preconditions, de-duplicate
    and move those checks up into i915_gem_flush_ring().

    Fixes gem_linear_blit.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284
    Signed-off-by: Chris Wilson <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    Tested-by: <email address hidden>

Ok forget the post above. Seems to have nothing to do with this bug.

On Kernel 2.8.38: /sys/module/i915/parameters/semaphores is set to 0
On Kernel 2.8.39: /sys/module/i915/parameters/semaphores is set to 1

if i do the following:

sudo -i
echo 1 > /sys/module/i915/parameters/semaphores

The Bug is also gone on 2.8.38 .....

If i set semaphores on 2.8.39 to 0 as it is default on 2.8.38, the bug also appears.

Avi Romanoff (aroman) wrote :

I have a brand new Vostro 3550, with an i5 Sandybridge, and I have this issue. Thankfully, (major props), Felix's above change resolved the issue. I hope this issue can be resolved for good, is this fix likely to land in Natty for real?

@Avi Romanoff

you can put this option in /etc/default/grub to make the change permanent
replace following line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

by
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=1"

setting /sys/module/i915/parameters/semaphores to 1 works in Ubuntu and Fedora. But in fedora(I tested on F15 --> 2.6.38.4 ), it resets to 0 after reboot. Is there a way to permanently set semaphore to 1 in Fedora?

Otto Kekäläinen (otto) wrote :

Comment #16 fixes it on my Dell E5420.

The funny thing is, that normally the error message appears randomly, but when I insert my Nokia CS-17 3G-modem, the error always appeared three times in a row:
[ 3678.937434] usb 2-1.2: new high speed USB device using ehci_hcd and address 9
[ 3679.050002] cdc_acm 2-1.2:1.1: ttyACM0: USB ACM device
[ 3679.051045] cdc_acm 2-1.2:1.3: ttyACM1: USB ACM device
[ 3698.657860] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1395816, at 1395816], missed IRQ?
[ 3703.191041] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1398151, at 1398151], missed IRQ?
[ 3704.728817] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1398192, at 1398192], missed IRQ?
[ 3718.388454] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1403486, at 1403486], missed IRQ?

I don't know how a device in USB port generates error messages related to a video driver, but it is.

Guido Nickels (gsn) wrote :

Changing semaphores setting doesn't help here (Fujitsu S751 with 8086:0126 graphics controller).

I still get reproducable freezes - for example when working with libreoffice it appears always after only a few actions (for example if I do some copy+paste stuff).

The only change is that instead complaining about missed IRQ it is now complaining about stuck semaphore...

syslog entry with i915.semaphores=0:

May 14 22:06:48 silk kernel: [ 1665.383643] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 443994, at 443994], missed IRQ?
May 14 22:07:51 silk kernel: [ 1727.982616] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 465770, at 465770], missed IRQ?

and with i915.semaphores=1:

May 16 20:56:32 silk kernel: [25529.161099] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
May 16 20:56:32 silk kernel: [25529.161124] [drm:kick_ring] *ERROR* Kicking stuck semaphore on render ring

I tried 2.6.38-9.43 from natty as well as 2.6.39-0.5~20110427 from ppa:kernel-ppa/ppa but no difference.

Download full text (3.5 KiB)

i'm having similar issues...and it most certainly seems to be originating from the kernel and not from the X server, as i've run through three different versions of both the kernel and the X server while holding one fixed and changing the other and changing the X server didn't affect the problem at all, but changing the kernels did.

not sure what debugging information might be helpful in helping to shed light on my particular config, but would be happy to post some, just let me know what to post. for now, i'll just attach an lspci and give a bit of a description:

(1) started with "latest version" of ubuntu iso on thinkpad x220 with corei7-2620m and intel HD3000. the iso at the time provided me with kernel and headers version 2.6.38-8.42. aside from some thinkpad-tinkering, everything pretty much worked "out of the box", as others have reported.

(2) then, connected to dell u2711 via displayport and got a desktop that was seemingly split between two windows, and everything on the u2711 was blacked out except for the top menu and a square sitting on the lower left corner of the panel the size of the thinkpad's display. thinkpad display unresponsive during this time (but not black). rebooted with display connected and both screens black (and unusable).

(3) tried to upgrade to kernel 2.6.39-1.6, which had the same outcome as kernel 2.6.39-2.7. upon reboot, monitors set to mirror same image with 1024x768 resolution. i know this is unrelated, but here and with the 2.6.38-8.42 kernel, the Fn-F7 worked for switching between resolutions. anyway, upon switching between resolutions i got the same outcome as in (2), except that now i could use the Fn-F7 to switch between non-working full-resolutions and working (mirrored) 1024x768 resolutions.

(4) after a bit of digging around, found a recommendation to try kernel 2.6.38-996-generic #201103251543 SMP Fri Mar 25 15:47:37 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux, and accompanying headers, which worked properly. currently using xorg-server 2:1.10.1+git20110429+server-1.10-branch.b4455b11-0ubuntu0sarvatt, but switching to 2.14.0ubuntu7.1 from the natty updates, as well as using 2.14-0-4ubuntu7 (natty) made absolutely no difference. these three xorg versions were used in all my testing.

At (4) both displays are function properly. however, i cannot switch between monitors without rebooting. whatever monitor i boot in works, but if that's the u2711 fed through the display port, then if i unplug the display port cable, the thinkpad LCD is not switched to automatically (and remains off, not just black). likewise, if i boot with the thinkpad display active, plugging in the u2711 9-times-out-of-10 causes both screens to go black, the Fn-F7 doesn't work, and to fix the problem i have to hard-reboot. it's interesting to note that in the gnome monitors panel, only the active display shows. in other words, if i boot while connected to the u2711, it shows (but the built-in laptop screen doesn't), and if i boot without the external display connected, the thinkpad LCD shows, but once i plug in the u2711 (without rebooting) the 1 time out of 10 the display doesn't go nuts on me, it only shows the u2711 and not ...

Read more...

Raul Dias (raul-dias) wrote :

doing what Felix proposed at #16 didn't work on natty (2.6.38-8-generic):
     # echo 1 > /sys/module/i915/parameters/semaphores
     -bash: echo: write error: Invalid argument

However, doing what proposed on #18 solved the problem.

XPS L502X with optimus/bumblebee working

Changed in linux (Ubuntu):
importance: Undecided → Medium

quick follow up to post #22...went over to 2.6.37-gentoo-r4 and compiled in i915 support. works fine. can post kernel options if necessary.

vilmos (vilmos) wrote :

Using the latest 2.6.38 kernel from Natty stable this message popped up in dmesg a few times every hour (my system always coming to a complete halt for 1-2 seconds), and after a few hours my system eventually freezed (this is a Lenovo X220 with Sandy Bridge). Setting the semaphore parameter to "1" as suggested above helped somewhat, but I still get the hangcheck errors and freezes.

Installing 2.6.39-0.5~20110427 from kernel-ppa improved on the situation, but eventually after a bit more than a day my system froze again with the following message in dmesg:

May 30 08:02:31 x220 kernel: [26264.491209] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
 timer elapsed... GPU hung
May 30 08:02:31 x220 kernel: [26264.491229] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:32 x220 kernel: [26265.987805] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
 timer elapsed... GPU hung
May 30 08:02:32 x220 kernel: [26265.987828] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:34 x220 kernel: [26267.484409] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
 timer elapsed... GPU hung
May 30 08:02:34 x220 kernel: [26267.484428] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:35 x220 kernel: [26268.981019] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
 timer elapsed... GPU hung
May 30 08:02:35 x220 kernel: [26268.981039] [drm:kick_ring] *ERROR* Kicking stuck semaphor
e on blt ring
May 30 08:02:37 x220 kernel: [26270.477625] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck
 timer elapsed...May 30 08:04:43 x220 kernel: imklog 4.6.4, log source = /proc/kmsg starte
d.

Semaphore is set to "1" by default on 2.6.39, although it seems from the kernel git log that this change has been reverted since. Now I'm testing 2.6.39 with semaphore set to "0" to see if it eliminates the problem.

vilmos (vilmos) wrote :

With semaphore set to "0" on 2.6.39-0.5~20110427:

[ 5136.145217] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1132622, at 1132622], missed IRQ?
[ 5382.346399] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1242913, at 1242913], missed IRQ?
[ 5551.951435] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blt ring idle [waiting on 1327817, at 1327817], missed IRQ?

It's actually easy to reproduce this (at least on my system): just click the ubuntu menu in the upper left corner, and start typing in the search box until the UI freezes for 1-2 seconds.

si14 (a2alt) wrote :

Is there any hope to get this fixed?

Pascal Hartig (passy) wrote :

See #18, the workaround is doing a great job for me.

si14 (a2alt) wrote :

It seems working, but there is something strange:
root@si14-laptop:~# cat /etc/default/grub | grep semaph
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=1"
root@si14-laptop:~# cat /sys/module/i915/parameters/semaphores
0
And this is so after reboot.

Anton Belyaev (anton-belyaev) wrote :

si14, check out the header of /etc/default/grub, it says you need to run grub-update after the modification.

si14 (a2alt) wrote :

Thanks, you are right.

vilmos (vilmos) wrote :

Yesterday I updated from the xorg-edgers ppa and the bug seems to be gone now. I even set the semaphore parameter to "0", which was a sure way to trigger this problem, and now there are no error messages about GPU hangs in dmesg.

This Phoronix article also suggests that this problem was fixed in mesa 7.11-devel, not in the kernel:

http://www.phoronix.com/scan.php?page=article&item=intel_snb_natty&num=1

Right now I have the latest xorg-edgers packages with kernel 2.6.39-0.5~20110427 from kernel-ppa , and everything is running smoothly.

Any Ubuntu developers care to track down which mesa commit(s) should be backported to Natty? Would be great if I could go back to stable Natty packages.

Pascal Hartig (passy) wrote :

@vilmos DId you notice any change regarding the power consumption? I read that the issue is related to power saving and I wonder if the fix has any impact on that.

vilmos (vilmos) wrote :

@Pascal no I did not, however I updated the packages only yesterday, so I can't really tell yet. I did not make any measurements on battery life before the update either, so I don't think I'll be able to give you exact numbers on any change on battery life.

Mathias Dietrich (theghost) wrote :

Workaround from #18 works fine here.

The system stucks no more. It would be nice to see a fix for this issue by default in Ubuntu, especially for new users with new hardware and litte knowledge.

tobyS (tobias-schlitt) wrote :

Fix #16 works fine so far here (2.6.38-8-generic, X220, i7-2620M Sandybridge). Thanks!

However, I'm additionally affected by #22 (4): I cannot dock/undock the X220 with the Lenovo Minidock Plus where I have 2 screens attached via display port. With a fresh boot there are initially some issues with the desktop background, which vanish after starting an app in full screen mode. However, switching between the display setups (undocked = notebook display, docked = 2 externals) is not possible without reboot).

Bilal Akhtar (bilalakhtar) wrote :

I'm using Ubuntu Oneiric with the 3.0 kernel in the repositories and the entire X stack from the xorg-edgers PPA. I don't get this message anymore (got it when I was using Natty with the main repository kernel and Natty X (not from xorg-edgers)) and the display doesn't hang anymore when typing in the Unity search dialog.

So if X is going to be updated this cycle, then this issue would be fixed for good in Oneiric. The kernel has already been upgraded to 3.0, so X is the only thing which needs an upstream upgrade.

Robert Hooker (sarvatt) wrote :

Someone has posted a patch that fixes this issue on the intel-gfx mailing list and it should hopefully be in 3.0-rc4. afterwards we will be able to cherry-pick it into stable. I have test kernels available here that fix the issue, with the side effect of also massively speeding up 3D with the default i915.semaphores=0 from my tests, please do post your results if you test it.

fix:
https://patchwork.kernel.org/patch/879532/

test kernels:
http://kernel.ubuntu.com/~sarvatt/lp761065/

Robert Hooker (sarvatt) on 2011-06-17
Changed in linux (Ubuntu):
status: Confirmed → Triaged
vilmos (vilmos) wrote :

Have been running the kernel from #38 for a few hours now, and it seems to fix the bug. No GPU lockups or hangs, no error messages in dmesg.

si14 (a2alt) wrote :

It looks like #38 fixes the bug. But I have some errors in dmesg (they of course can be unrelated to #38):

[ 1406.247369] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro,data=ordered,discard,commit=0
[ 1406.252920] EXT4-fs (sda1): re-mounted. Opts: data=ordered,discard,commit=0
[ 1406.260472] EXT4-fs (sda5): re-mounted. Opts: user_xattr,data=ordered,discard,commit=0
[ 1406.273500] thinkpad_acpi: THERMAL ALERT: unknown thermal alarm received
[ 1406.273509] thinkpad_acpi: unhandled HKEY event 0x6040
[ 1406.273514] thinkpad_acpi: please report the conditions when this event happened to <email address hidden>
[ 1406.274433] thinkpad_acpi: EC reports that Thermal Table has changed

vilmos (vilmos) wrote :

@si14: that's an unrelated issue and harmless. Also happens with the latest stable natty kernel.

Steffen Rusitschka (rusi) wrote :

#38 fixes it for me, too (semaphores=0). Thanks!

Changed in linux:
status: Confirmed → Fix Released
Jonathan Davies (jpds) wrote :

The proposed patched kernel has been working fine for me for two days.

Robert Hooker (sarvatt) on 2011-06-21
description: updated
Tim Gardner (timg-tpi) on 2011-06-22
Changed in xserver-xorg-video-intel (Ubuntu Natty):
status: New → Invalid
Changed in linux (Ubuntu Natty):
assignee: nobody → Robert Hooker (sarvatt)
status: New → Fix Committed
vilmos (vilmos) wrote :

After using the kernel from #38 for a few days I got a GPU hang again:

Jun 25 10:58:05 x220 kernel: [154505.782665] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 25 10:58:05 x220 kernel: [154505.782788] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 79090 at 79089, next 79117)

This, however:

https://bugzilla.redhat.com/show_bug.cgi?id=684097

indicates that this needs to be applied for the BSD ring too (see second part of the fix).

@Robert, could you create a kernel packages with both patches from the Red Hat bugzilla page applied? Thanks.

The kernel at #38 works for me too. Thank-you!

Robert Hooker (sarvatt) wrote :

vilmos: Do you even have i965-va-driver installed? if not it wont be any help, the BSD ring is used for accelerated h264 acceleration through libva

vilmos (vilmos) wrote :

@Robert I don't know if any of my installed applications use hw h264 acceleration, but once in a while (every few days) I still experience GPU hangs even with the kernel from #38. The situation improved a LOT though with it.

I can confirm that the fix in #38 works on my X220T. I haven't updated the bios or made any other graphical changes.

B.B. Lauret (bblauret) wrote :

What does 'applied' mean in this message? https://lists.ubuntu.com/archives/kernel-team/2011-June/015962.html

There isn't a new kernel in natty-updates, and in natty-proposed this bug isn't mentioned (http://kernel.ubuntu.com/~kernel-ppa/reports/sru-report.html)...

Am I misunderstanding something?

Luigi R. (xluigi84) wrote :

Same error with HP DV6 6030el i72630 and AMD Radeon HD 6470M. I can not use radeon driver because Natty freeze during the boot so I have to use i915. But sometimes and randomly I receive freeze also with this driver. Keyboard and mouse are locked, alt+stamp REISUB doesn't work. Looking the kernel log some minute before the freeze I can find this error [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed. Now I'm using the solution in #18 and seems to work fine. I can understand why this bug is classified "medium" .....freezes like these impose forced shutdowns with high risks of damage to the hardware. I was near to use again windows just only to safeguard my purchase. Please fix it!!! Thanks

Luigi R. (xluigi84) wrote :

Ps: The freezes occur using Unity.

Robert Hooker (sarvatt) wrote :

bblauret: it means its applied to the pending -proposed kernel git tree. it will be in the 2.6.38-11 kernel whenever that release is started, should hopefully be in -proposed when 2.6.38-10 migrates to -updates next week.

Luigi R. (xluigi84) wrote :

I tried also the new kernel 2.6.39 of kernel ppa repository but same bug occurs.

Robert Hooker (sarvatt) wrote :

closing oneiric task, this was fixed in 3.0-rc4 released there some time ago

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Chris Van Hoof (vanhoof) on 2011-07-12
Changed in linux (Ubuntu Natty):
importance: Undecided → Medium
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Medium → Undecided
Luigi R. (xluigi84) wrote :

I received again a total freeze yestarday but I don't know if it is linked to this bug. The freeze occurs letting run the cosmo screensaver. Can you try if it is the same for you, please?

Luigi R. (xluigi84) wrote :

Sorry Robert Hooker I didn't read your message. Is there some patch for natty or I have to wait oneiric?

B.B. Lauret (bblauret) wrote :

Robert Hooker: Thanks for your explanation, now that 2.6.38-10 has hit -updates, I see forward to the following kernel in -proposed.

Matthias Schmidt (mschmidt) wrote :

I see the strange "bsd ring behavior" when using the fix of #18 with my Thinkpad X220. Added the semaphore option to grub renders my system unusable. Console is no longer high-res and I only see graphics errors with X, no chance to work. I get the following error:

[drm:init_ring_common] *ERROR* gen6 bsd ring initialization failed ctl 0001f003 head 00000000 tail 00000000 start 00022000
[drm:i915_driver_load] *ERROR* failed to init modeset

If I boot the system w/o the grub parameter and set it afterwards (via /sys), it seems to work.

So take care if you put the fix into the grub config.

Maarten Kossen (mpkossen) wrote :

I've experienced this bug as well on a Clevo W150HRM with a Sandy Bridge Core I7 (2820QM). The solution in #16 fixes it for me. I have yet to try #18, but I'm sure it'll work as well. Running 2.6.38-10-generic.

Matthias Schmidt (mschmidt) wrote :

The fix in #18 solves the "micro-hangs" with the Unity launcher and X in general for me, but the system still freezes after some uptime. It is completely locked up and only a hard reboot "solves" the freeze. Is anybody also seeing this?

Stephen Rees-Carter (valorin) wrote :

The fix in #18 solves micro-freeze problems with OpenGL (i.e. games), but it does not solve them for normal usage.
For example, it has locked up 3 times during the writing of this message.

Also note, I have 'noapic' in my GRUB config as well, as that is the only way I can get my laptop to successfully boot.
The full line is:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash noapic i915.semaphores=1"

Bug #806434 has the full specs of my laptop for those who are interested.

Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Adam Glasgall (aglasgall) wrote :

2.6.38-11 in -proposed seems to fix the problem for me on my ThinkPad x220.

Matthias Schmidt (mschmidt) wrote :

Can confirm this. 2.6.38-11 absolutely fixes the hangs for me. Not sure about the freezes, as this needs some time. Nevertheless, the fix should go in.

tags: added: verification-done-natty
removed: verification-needed-natty

Confirm. Lenovo Thinkpad EDGE E420, 2.6.38-11 fixes problem and Unity now works much more comfortable.

Stephen Rees-Carter (valorin) wrote :

Confirmed, it fixes the semaphore problem on my ThinkPad L520 :)

It doesn't remove the need for noapic though, but that is likely a different issue.

Maarten Kossen (mpkossen) wrote :

#18 doesn't fix it on 2.6.38-10, #16 does.

B.B. Lauret (bblauret) wrote :

Confirmed, 2.6.38-11 fixes my issues.

Confirmed! This bug (blt ring idle) seems to be fixed on 2.6.38-11. There are still issues and other GPU hungs though - after 2 seconds of playing sauerbraten in full-screen:

[ 54.562391] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 54.563417] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 36628 at 36618, next 36629)

Don't know if it's related to this bug though. Seems to be related to xserver-xorg-intel. The whole sandy bridge GPU issues are really annoying. And this nearly 8 months after first sandy bridge release ...

exactt (giesbert) wrote :
Download full text (3.6 KiB)

@felix

Also got errors after a while with the latest proposed kernel. I think we should file a new bug report. Would you mind doing that?

[21589.558177] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21589.559706] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353483 at 13353481, next 13353484)
[21595.961958] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21595.962008] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353491 at 13353488, next 13353492)
[21602.395716] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21602.395773] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353502 at 13353492, next 13353503)
[21608.789507] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21608.789568] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353511 at 13353492, next 13353512)
[21615.183307] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21615.183367] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353522 at 13353492, next 13353523)
[21617.091445] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[21617.091482] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 13353525 at 13353492, next 13353535)
[21617.091710] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[21617.091714] [drm:i915_reset] *ERROR* Failed to reset chip.
[21617.091799] compiz[1548]: segfault at 0 ip 00007f7a5b4f4be8 sp 00007fff7a7df7e0 error 6 in i965_dri.so[7f7a5b483000+ac000]
[21620.669148] compiz[6719]: segfault at 0 ip 00007f3fefdd1acc sp 00007fff7cab5e00 error 6 in i965_dri.so[7f3fefdb2000+ac000]
[21624.072709] compiz[6732]: segfault at 0 ip 00007fe1b4a73acc sp 00007fffa1461b00 error 6 in i965_dri.so[7fe1b4a54000+ac000]
[21627.516388] compiz[6735]: segfault at 0 ip 00007f9dcf6deacc sp 00007fffb462c580 error 6 in i965_dri.so[7f9dcf6bf000+ac000]
[21630.932591] compiz[6744]: segfault at 0 ip 00007fc266a55acc sp 00007fffb08f1f40 error 6 in i965_dri.so[7fc266a36000+ac000]
[21634.313294] compiz[6747]: segfault at 0 ip 00007f4ae8c08acc sp 00007fff856b7910 error 6 in i965_dri.so[7f4ae8be9000+ac000]
[21637.743397] compiz[6749]: segfault at 0 ip 00007f7bdb4a2acc sp 00007fff870d1b30 error 6 in i965_dri.so[7f7bdb483000+ac000]
[21641.178166] compiz[6751]: segfault at 0 ip 00007f4e534a2acc sp 00007fffd2903860 error 6 in i965_dri.so[7f4e53483000+ac000]
[21644.626419] compiz[6753]: segfault at 0 ip 00007f1b16c61acc sp 00007fff6fa3d4e0 error 6 in i965_dri.so[7f1b16c42000+ac000]
[21648.052081] compiz[6755]: segfault at 0 ip 00007f0566a55acc sp 00007fffc7c3e4b0 error 6 in i965_dri.so[7f0566a36000+ac000]
[21651.557582] compiz[6758]: segfault at 0 ip 00007f3a012f2acc sp 00007fffbe4fa580 error 6 in i965_dri.so[7f3a012d3000+ac000]
[21654.987556] compiz[6760]: segfault at 0 ip 00007ffc98681acc sp 00007fff3cbcbd60 error 6 in i965_dri.so[7ffc98662000+ac000]
[21658.410040] compiz[6762]: segfault at 0 ip 00007f60b431cacc sp 00007fff046c12c0 error 6 in i965...

Read more...

exactt (giesbert) wrote :

@felix @all: already found a bug which sounds just like the problem: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/805586

@exactt (giesbert)

yes sounds like this bug - although i do not experience segfaults with compiz.

Luigi R. (xluigi84) wrote :

No error found with new kernel 2.6.38.11. I also updated xserver-xorg-video-intel from the proposed repo. Actually no freeze...I hope all has been fixed! Thanks!!!!

Matthias Schmidt (mschmidt) wrote :

Just for the record: After one week of extensive testing, -11 also fixes the regular freezes. The system runs rock stable now!

Stephen Rees-Carter (valorin) wrote :

After a week of testing, I'm still experiencing occasional system-wide freezes as well as frequent mini-freezes. I guess my hardware is different...

Steffen Rusitschka (rusi) wrote :

-11 fixes the short hangs (no more dmesg alerts) but my system still freezes completely (reboot required) once or twice a day. Dell Latitude 5520.

cuc (cuc+) wrote :

-11 fixes the short hangs, though i often have even heavier lag after standby or hibernate...
samsung rf511

Luigi R. (xluigi84) wrote :

I can confirm....the total freezes still exist. :-( No error message in kernel log. I'm destroying my new notebook forcing the shutdown. Do you need some log to find the solution. I available for testing.

HP DV6 6030el i7 630

Robert Hooker (sarvatt) wrote :

This bug was never about complete system hangs, just stuttering caused when using GL. Please do file new bugs if you are having hangs.

Luigi R. (xluigi84) wrote :

It usually happens when I used program like tecplot or Fluent post-processing tool. I suppose it is releated to GL issue.

Launchpad Janitor (janitor) wrote :
Download full text (13.4 KiB)

This bug was fixed in the package linux - 2.6.38-11.48

---------------
linux (2.6.38-11.48) natty-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #818175

  [ Upstream Kernel Changes ]

  * Revert "HID: magicmouse: ignore 'ivalid report id' while switching
    modes"
    - LP: #814250

linux (2.6.38-11.47) natty-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #811180

  [ Keng-Yu Lin ]

  * SAUCE: Revert: "dell-laptop: Toggle the unsupported hardware
    killswitch"
    - LP: #775281

  [ Ming Lei ]

  * SAUCE: fix yama_ptracer_del lockdep warning
    - LP: #791019

  [ Stefan Bader ]

  * SAUCE: Re-enable RODATA for i386 virtual
    - LP: #809838

  [ Tim Gardner ]

  * [Config] Add grub-efi as a recommended bootloader for server and
    generic
    - LP: #800910
  * SAUCE: rtl8192se: Force a build for a 2.6/3.0 kernel
    - LP: #805494

  [ Upstream Kernel Changes ]

  * Revert "bridge: Forward reserved group addresses if !STP"
    - LP: #793702
  * Fix up ABI directory
  * bonding: Incorrect TX queue offset, CVE-2011-1581
    - LP: #792312
    - CVE-2011-1581
  * fs/partitions/efi.c: corrupted GUID partition tables can cause kernel
    oops
    - LP: #795418
    - CVE-2011-1577
  * usbnet/cdc_ncm: add missing .reset_resume hook
    - LP: #793892
  * ath5k: Disable fast channel switching by default
    - LP: #767192
  * mm: vmscan: correctly check if reclaimer should schedule during
    shrink_slab
    - LP: #755066
  * mm: vmscan: correct use of pgdat_balanced in sleeping_prematurely
    - LP: #755066
  * ALSA: hda - Use LPIB for ATI/AMD chipsets as default
    - LP: #741825
  * ALSA: hda - Enable snoop bit for AMD controllers
    - LP: #741825
  * ALSA: hda - Enable sync_write workaround for AMD generically
    - LP: #741825
  * cpuidle: menu: fixed wrapping timers at 4.294 seconds
    - LP: #774947
  * drm/i915: Fix gen6 (SNB) missed BLT ring interrupts.
    - LP: #761065
  * USB: ehci: remove structure packing from ehci_def
    - LP: #791552
  * drm/i915: disable PCH ports if needed when disabling a CRTC
    - LP: #791752
  * kmemleak: Do not return a pointer to an object that kmemleak did not
    get
    - LP: #793702
  * kmemleak: Initialise kmemleak after debug_objects_mem_init()
    - LP: #793702
  * Fix _OSC UUID in pcc-cpufreq
    - LP: #793702
  * CPU hotplug, re-create sysfs directory and symlinks
    - LP: #793702
  * Fix memory leak in cpufreq_stat
    - LP: #793702
  * net: recvmmsg: Strip MSG_WAITFORONE when calling recvmsg
    - LP: #793702
  * ftrace: Only update the function code on write to filter files
    - LP: #793702
  * qla2xxx: Fix hang during driver unload when vport is active.
    - LP: #793702
  * qla2xxx: Fix virtual port failing to login after chip reset.
    - LP: #793702
  * qla2xxx: Fix vport delete hang when logins are outstanding.
    - LP: #793702
  * powerpc/kdump64: Don't reference freed memory as pacas
    - LP: #793702
  * powerpc/kexec: Fix memory corruption from unallocated slaves
    - LP: #793702
  * x86, cpufeature: Fix cpuid leaf 7 feature detection
    - LP: #793702
  * ath9k_hw: do noise floor calibration only on required chain...

Changed in linux (Ubuntu Natty):
status: Fix Committed → Fix Released
Rob van der Linde (robvdl) wrote :

Why does it say this bug is fixed for natty, when it still is happening? I have fully upgraded my system (to the 2.6.38-15 kernel) and it's still happening.

I have tried a lot of workarounds mentioned here, none work, I still get GPU hangs.

V. A. (nyappy) wrote :

Still happens in Precise(12.04).

piotr zimoch (ebytyes) on 2013-05-22
Changed in xserver-xorg-video-intel (Ubuntu):
status: Invalid → New
status: New → Incomplete
status: Incomplete → Opinion
status: Opinion → Invalid
status: Invalid → Confirmed
status: Confirmed → In Progress
status: In Progress → Fix Committed
status: Fix Committed → Fix Released
Arie Skliarouk (skliarie) wrote :

On Lenovo g570 ubuntu 12.10 worked perfectly.

After upgrade to 13.04 X-Windows started locking up (only the mouse cursor was reacting to mouse movements, but nothing else). The lockup occurs after couple of minutes working in gnome-classic-fallback window manager, no 3D activity was done at the time (at least intentionally). The X lockup is accompanied by messages like these in dmesg (every second or so):

[ 533.963858] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... render ring idle [waiting on 25899, at 25898], missed IRQ?

Arie Skliarouk (skliarie) wrote :
Arie Skliarouk (skliarie) wrote :
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.