[i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)

Bug #714719 reported by mkis62 on 2011-02-07
52
This bug affects 6 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
High
linux (Ubuntu)
Medium
Andy Whitcroft
xserver-xorg-video-intel (Ubuntu)
High
Unassigned

Bug Description

X crashed while setting preferences in Decibel Audio Player
tty1-6 works ... rebooting...

From GPU dump:
ACTHD: 0xffffffff
EIR: 0x00000000
EMR: 0xffffffed
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x02000004
IPEIR: 0x00000000
INSTDONE: 0x03c7c081
    busy: IDCT
    busy: IQ
    busy: PR
    busy: VLD
    busy: Instruction parser
    busy: Windowizer
    busy: Intermediate Z
    busy: Perspective interpolation
    busy: Texture decompression
    busy: Sampler Cache
    busy: Filtering
    busy: Bypass FIFO
    busy: Pixel shader
    busy: Color calculator
    busy: Map L2

From dmesg:
[ 2026.252160] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2026.254795] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 402290 at 402288, next 402291)

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu6
ProcVersionSignature: Ubuntu 2.6.38-2.29-generic 2.6.38-rc3
Uname: Linux 2.6.38-2-generic i686
Architecture: i386
Chipset: i915gm
CompisitorRunning: None
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1024x768
 edid-base64: AP///////wANrwYVAAAAACgMAQOAHhd4CnfxoFpLliQYT1QACAABAQEBAQEBAQEBAQEBAQEBZBkAQEEAJjAYiDYAMOQQAAAYAAAA/gBOMTUwWDMtTDA3CiAgAAAA/gBDTU8KICAgICAgICAgAAAA/gBOMTUwWDMtTDA3CiAgAOs=
DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
Date: Mon Feb 7 18:50:19 2011
DistUpgraded: Yes, recently upgraded Log time: 2011-01-03 14:04:23.058239
DistroCodename: natty
DistroVariant: ubuntu
DumpSignature: 82856c05
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GconfCompiz:

GraphicsCard:
 Subsystem: Acer Incorporated [ALI] Device [1025:006a]
   Subsystem: Acer Incorporated [ALI] Device [1025:006a]
InterpreterPath: /usr/bin/python2.7
MachineType: Acer TravelMate 2410
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic root=UUID=263aecd1-0156-49f9-8d5e-99e8079b240f ro gfxpayload=true quiet splash vt.handoff=7
ProcKernelCmdLine_: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic root=UUID=263aecd1-0156-49f9-8d5e-99e8079b240f ro gfxpayload=true quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6~3ubuntu3
 libdrm2 2.4.23-1ubuntu3
 xserver-xorg-video-intel 2:2.14.0-1ubuntu6
Renderer: Hardware acceleration
SourcePackage: xserver-xorg-video-intel
Title: [i915gm] GPU lockup 82856c05
UserGroups:

dmi.bios.date: 02/07/2006
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: V1.09
dmi.board.name: Morar
dmi.board.vendor: Acer
dmi.board.version: Rev
dmi.chassis.asset.tag: None
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrV1.09:bd02/07/2006:svnAcer:pnTravelMate2410:pvr0100:rvnAcer:rnMorar:rvrRev:cvnAcer:ct10:cvrN/A:
dmi.product.name: TravelMate 2410
dmi.product.version: 0100
dmi.sys.vendor: Acer
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.23-1ubuntu3
version.libgl1-mesa-glx: libgl1-mesa-glx 7.10-1ubuntu1
version.xserver-xorg: xserver-xorg 1:7.6~3ubuntu3
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.13.2+git20110124.fadee040-0ubuntu4
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.14.0-1ubuntu6
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110107+b795ca6e-0ubuntu4

mkis62 (mihaikx62) wrote :
Bryce Harrington (bryce) on 2011-02-07
description: updated
Bryce Harrington (bryce) on 2011-02-07
summary: - [i915gm] GPU lockup 82856c05
+ [i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)
Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Triaged
importance: Undecided → High
Download full text (3.4 KiB)

Forwarding this bug from Ubuntu reporter mkis62:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/714719

[Problem]
GPU lockup (of the "Hangcheck timer elapsed" variety) on 2.6.38-2 kernel and 2.14.0 intel driver with i915gm hardware. No compositor is running.

[Original Description]
X crashed while setting preferences in Decibel Audio Player
tty1-6 works ... rebooting...

From GPU dump:
ACTHD: 0xffffffff
EIR: 0x00000000
EMR: 0xffffffed
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x02000004
IPEIR: 0x00000000
INSTDONE: 0x03c7c081
    busy: IDCT
    busy: IQ
    busy: PR
    busy: VLD
    busy: Instruction parser
    busy: Windowizer
    busy: Intermediate Z
    busy: Perspective interpolation
    busy: Texture decompression
    busy: Sampler Cache
    busy: Filtering
    busy: Bypass FIFO
    busy: Pixel shader
    busy: Color calculator
    busy: Map L2

From dmesg:
[ 2026.252160] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2026.254795] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 402290 at 402288, next 402291)

DistroRelease: Ubuntu 11.04
Package: xserver-xorg-video-intel 2:2.14.0-1ubuntu6
ProcVersionSignature: Ubuntu 2.6.38-2.29-generic 2.6.38-rc3
Uname: Linux 2.6.38-2-generic i686
Architecture: i386
Chipset: i915gm
CompisitorRunning: None
DRM.card0.LVDS.1:
 status: connected
 enabled: enabled
 dpms: On
 modes: 1024x768
 edid-base64: DRM.card0.VGA.1:
 status: disconnected
 enabled: disabled
 dpms: Off
 modes:
 edid-base64:
Date: Mon Feb 7 18:50:19 2011
DistUpgraded: Yes, recently upgraded Log time: 2011-01-03 14:04:23.058239
DistroCodename: natty
DistroVariant: ubuntu
DumpSignature: 82856c05
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GconfCompiz:

GraphicsCard:
 Subsystem: Acer Incorporated [ALI] Device [1025:006a]
   Subsystem: Acer Incorporated [ALI] Device [1025:006a]
InterpreterPath: /usr/bin/python2.7
MachineType: Acer TravelMate 2410
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic root=UUID=263aecd1-0156-49f9-8d5e-99e8079b240f ro gfxpayload=true quiet splash vt.handoff=7
ProcKernelCmdLine_: BOOT_IMAGE=/boot/vmlinuz-2.6.38-2-generic root=UUID=263aecd1-0156-49f9-8d5e-99e8079b240f ro gfxpayload=true quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6~3ubuntu3
 libdrm2 2.4.23-1ubuntu3
 xserver-xorg-video-intel 2:2.14.0-1ubuntu6
Renderer: Hardware acceleration
SourcePackage: xserver-xorg-video-intel
Title: [i915gm] GPU lockup 82856c05
UserGroups:

dmi.bios.date: 02/07/2006
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: V1.09
dmi.board.name: Morar
dmi.board.vendor: Acer
dmi.board.version: Rev
dmi.chassis.asset.tag: None
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrV1.09:bd02/07/2006:svnAcer:pnTravelMate2410:pvr0100:rvnAcer:rnMorar:rvrRev:cvnAcer:ct10:cvrN/A:
dmi.product.name: TravelMate 2410
dmi.product.version: 0100
d...

Read more...

Created attachment 43065
i915_error_state.txt

Created attachment 43066
BootDmesg.txt

Created attachment 43067
CurrentDmesg.txt

Created attachment 43068
XorgLog.txt

Created attachment 43069
XorgLogOld.txt

Bryce Harrington (bryce) wrote :

mkis62 - I've forwarded this bug upstream tohttps://bugs.freedesktop.org//show_bug.cgi?id=34014 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Changed in xserver-xorg-video-intel:
importance: Unknown → High
status: Unknown → Confirmed

*** Bug 34015 has been marked as a duplicate of this bug. ***

This patch would confirm my hypothesis that is an invalid unfenced alignment:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f136899..c970b81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1416,6 +1416,7 @@ i915_gem_get_unfenced_gtt_alignment(struct drm_i915_gem_ob
            obj->tiling_mode == I915_TILING_NONE)
                return 4096;

+ return i915_gem_get_gtt_size(obj);
        /*
         * Older chips need unfenced tiled buffers to be aligned to the left
         * edge of an even tile row (where tile rows are counted as if the bo is

Bryce Harrington (bryce) wrote :

From the upstream bug report, they suggest testing with the following kernel patch:

This patch would confirm my hypothesis that is an invalid unfenced alignment:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f136899..c970b81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1416,6 +1416,7 @@ i915_gem_get_unfenced_gtt_alignment(struct
drm_i915_gem_ob
            obj->tiling_mode == I915_TILING_NONE)
                return 4096;

+ return i915_gem_get_gtt_size(obj);
        /*
         * Older chips need unfenced tiled buffers to be aligned to the left
         * edge of an even tile row (where tile rows are counted as if the bo
is

tags: added: kernel-key
Changed in linux (Ubuntu):
status: New → Triaged
Bryce Harrington (bryce) wrote :

Given the kernel patch from upstream, that seems to indicate this is going to require fixing on the kernel side. I've notified the kernel team and will close out the X task at this time.

If for some reason it turns out this does need a patch on the X side, please reopen the xserver-xorg-video-intel task.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Invalid
Andy Whitcroft (apw) on 2011-02-08
Changed in linux (Ubuntu):
assignee: nobody → Andy Whitcroft (apw)
importance: Undecided → Medium
Bryce Harrington (bryce) wrote :

From the dmesg of one of the dupes:

[ 4635.189079] exe (9084): /proc/9084/oom_adj is deprecated, please use /proc/9084/oom_score_adj instead.
[ 6000.472355] do_general_protection: 12 callbacks suppressed
[ 6000.472362] exaile[11471] general protection ip:92e6d7 sp:bf9f9ac0 error:0 in libglib-2.0.so.0.2793.0[8cf000+d5000]
[ 9193.006644] python[17260]: segfault at 30 ip 0380c483 sp bf999d70 error 4 in libgstreamer-0.10.so.0.28.0[37ca000+c3000]
[ 9644.824026] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[ 9644.826731] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 4693919 at 4693917, next 4693920)

Bryce Harrington (bryce) wrote :

My comment on that bug:

"""
Hmm, from the dmesg output it sounds sort of like your system was already 'sick' before X hung. Perhaps an out-of-memory situation?

Could you please explain more about the conditions leading to this system freeze? And have you seen it more often than just this one time?

Is this a freshly installed system, or did you upgrade from an earlier Ubuntu?
"""

His reply:
"""
Thanks for your replay
Some details:
after the last week's updates Exaile crashed >>>

INFO : Loading Exaile 0.3.2.0...
INFO : Loading settings...
** Message: pygobject_register_sinkfunc is deprecated (GstObject)
INFO : Loading plugins...
INFO : Attempting to connect to AudioScrobbler (http://post.audioscrobbler.com/)
INFO : Logged in successfully to AudioScrobbler (http://post.audioscrobbler.com/)
INFO : Connected to AudioScrobbler
INFO : Loading collection...
INFO : Loading devices...
INFO : Loading interface...
INFO : Loading main window...
INFO : Connecting main window events...
INFO : Loading panels...
INFO : Connecting panel events...
Traceback (most recent call last):
  File "/usr/lib/exaile/exaile.py", line 52, in <module> main()
  File "/usr/lib/exaile/exaile.py", line 49, in main exaile = main.Exaile()
  File "/usr/lib/exaile/xl/main.py", line 96, in __init__ self.__init()
  File "/usr/lib/exaile/xl/main.py", line 220, in __init self.gui = xlgui.Main(self)
  File "/usr/lib/exaile/xlgui/__init__.py", line 124, in __init__ self.main._connect_panel_events()
  File "/usr/lib/exaile/xlgui/main.py", line 927, in _connect_panel_events
    panel.connect('append-items', lambda panel, items, sort=sort:
TypeError: <PlaylistsPanel object at 0x99d47fc (xlgui+panel+playlists+PlaylistsPanel at 0xbbb04b0)>: unknown signal name: append-items
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>.
After this happened, I tried Decibel Audio Player --- crashed 4 times with system freeze and X breakdown twice (and active tty1-5) and total system freeze twice;
all crashes occurred when tried to modify some preferences: importing directories, modifying settings like IM output...
maybe the problem somehow Python-related ... waiting for upgrades

The current system is a result of continuous upgrades since ubuntu 9.10 (if I remember well)

Regards,
"""

Bryce Harrington (bryce) wrote :

Hmm, would be curious if the audio player causes an out of memory situation that crashes X.

Are you able to reproduce this with a specific set of steps? If so could you enumerate them here? If the issue is caused by the audio player, then that might help narrow down what it is doing to cause it.

Andy Whitcroft (apw) wrote :

Ok I have build some Natty kernels with that DEBUG patch applied (note this is not a fix mearly a mechanism to confirm the root cause). Could those of you affected please test out these kernels (they should work on Maverick too) and report back here. The kernels are at the URL below:

    http://people.canonical.com/~apw/lp714719-natty/

Thanks.

Andy Whitcroft (apw) on 2011-02-10
Changed in linux (Ubuntu):
status: Triaged → Incomplete

We packaged this patch into a kernel for the bug reporter to test:

   http://people.canonical.com/~apw/lp714719-natty/

We have not yet heard back from him in a couple weeks.

However, we asked other bug reporters with vaguely similar lockups to test as well, and this past weekend one of them tested it and provided the following dmesg after reproducing a lockup.

   https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/718767/+attachment/1861287/+files/dmesg.txt

Hmm, I think I'm seeing this too on my X41T:

Recently upgraded Debian and kernel and got gpu hangs again.
I upgraded to latest libdrm2 and xf86-video-intel, but still getting gpu hangs.
Especially chrome seems to have a knack for causing these (aggressive use of acceleration features I guess).

Linux navi 2.6.38-rc7 #64 PREEMPT Sun Mar 6 14:32:50 CET 2011 i686 GNU/Linux

ii libdrm2 2.4.24-1 Userspace interface to kernel DRM services -
ii xserver-xorg-v 2:2.14.901-1 X.Org X server -- Intel i8xx, i9xx display d

(Both built myself from newest upstream packages released last week).

intel_gpu_dump:
ACTHD: 0xffffffff
EIR: 0x00000000
EMR: 0xffffffed
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x02000004
IPEIR: 0x00000000
INSTDONE: 0x038ff8c1
    busy: IDCT
    busy: IQ
    busy: PR
    busy: VLD
    busy: Instruction parser
    busy: Setup engine
    busy: Windowizer
    busy: Intermediate Z
    busy: Bypass FIFO
    busy: Pixel shader
    busy: Color calculator
Ringbuffer: Reminder: head pointer is GPU read, tail pointer is CPU write
ringbuffer at 0x00000000:
(copy&paste from terminal, forgot to redirect into file before resetting the gpu with a suspend-resume cycle).

dmesg:
[29103.032023] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[29103.032023] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 1775973 at 1775971, next 1775974)
[29103.032023] [drm:i915_reset] *ERROR* Failed to reset chip.

00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)

00:02.0 0300: 8086:2592 (rev 03)
00:02.1 0380: 8086:2792 (rev 03)

Vendor: 0x8086, Device: 0x2592, Revision: 0x03 (B1/C0)

BTW, while a suspend-resume should reset the gpu, I see this:

[31055.564022] [drm] Manually setting wedged to 0
[31055.564022] [drm:i915_reset] *ERROR* Failed to reset chip.
Why does it fail?
The units are not busy anymore according to intel_gpu_top, so I'd expect "echo 0 > /sys/kernel/debug/dri/0/i915_wedged" should unwedge it, but it doesn't

Created attachment 44183
i915 dump after s2mem (tried to recover from wedged gpu), but i915 claims it still can't reset the gpu

(In reply to comment #11)
> BTW, while a suspend-resume should reset the gpu, I see this:
>
> [31055.564022] [drm] Manually setting wedged to 0
> [31055.564022] [drm:i915_reset] *ERROR* Failed to reset chip.
> Why does it fail?

It fails because we have not found the means to successfully reset that chipset yet. It may well be the only way is to power cycle the PCI device. Meh.

> The units are not busy anymore according to intel_gpu_top, so I'd expect "echo
> 0 > /sys/kernel/debug/dri/0/i915_wedged" should unwedge it, but it doesn't

The units are idle because the chip hit a fatal error and disabled those units.

(In reply to comment #13)
> (In reply to comment #11)
> > BTW, while a suspend-resume should reset the gpu, I see this:
> >
> > [31055.564022] [drm] Manually setting wedged to 0
> > [31055.564022] [drm:i915_reset] *ERROR* Failed to reset chip.
> > Why does it fail?
>
> It fails because we have not found the means to successfully reset that chipset
> yet. It may well be the only way is to power cycle the PCI device. Meh.
>
> > The units are not busy anymore according to intel_gpu_top, so I'd expect "echo
> > 0 > /sys/kernel/debug/dri/0/i915_wedged" should unwedge it, but it doesn't
>
> The units are idle because the chip hit a fatal error and disabled those units.

I don't think so. They are only idle after coming back out of suspend to ram, so I think it's probably because the GPU was power-cycled.
Both resume from disk and resume from ram have the same effect here.
I think it would be very helpful if KMS/DRM could recover from the GPU hang after suspend to ram or suspend to disk, when the GPU was power-cycled. It used to be the case that 'echo 1 > i915_wedged' would restart the driver after resume, but it seems some internals have changed so that this no longer works. If it would be able to recover in this case it would avoid the need to completely reboot the system to recover.

*** Bug 34948 has been marked as a duplicate of this bug. ***

Created attachment 44468
i915_error_state from #34948

Attaching another i915_error_state variant.

Can you give drm-intel-staging, and in particular,

commit 0faba0d4e49361886b16c703995a3477951b14e5
Author: Chris Wilson <email address hidden>
Date: Thu Mar 17 15:23:22 2011 +0000

    drm/i915: Fix tiling corruption from pipelined fencing

    ... even though it was disabled. A mistake in the handling of fence reuse
    caused us to skip the vital delay of waiting for the object to finish
    rendering before changing the register.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584
    Cc: Andy Whitcroft <email address hidden>
    Cc: Daniel Vetter <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    [Note for 2.6.38-stable, we need to reintroduce the interruptible passing]
    Signed-off-by: Chris Wilson <email address hidden>

a whirl?

Working on the theory that it is one and the same bug:

commit b5b5ac2dec49ea5ae033434efa90863aa5cdfb2c
Author: Chris Wilson <email address hidden>
Date: Thu Mar 17 15:23:22 2011 +0000

    drm/i915: Fix tiling corruption from pipelined fencing

    ... even though it was disabled. A mistake in the handling of fence reuse
    caused us to skip the vital delay of waiting for the object to finish
    rendering before changing the register.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584
    Cc: Andy Whitcroft <email address hidden>
    Cc: Daniel Vetter <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    [Note for 2.6.38-stable, we need to reintroduce the interruptible passing]
    Signed-off-by: Chris Wilson <email address hidden>
    Tested-by: Dave Airlie <email address hidden>

Bryce Harrington (bryce) wrote :

Upstream believes this is fixed with the following commit:

commit b5b5ac2dec49ea5ae033434efa90863aa5cdfb2c
Author: Chris Wilson <email address hidden>
Date: Thu Mar 17 15:23:22 2011 +0000

    drm/i915: Fix tiling corruption from pipelined fencing

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released

Original reporter tested a kernel that includes commit b5b5ac2d patched in and says he still sees the hang:

David Coggins wrote on 2011-03-20:
The system froze for me testing the latest natty 2.6.38-7.36 which should incorporate the fix for bug 717114

drm/i915: Fix tiling corruption from pipelined fencing

Mar 21 11:29:13 eee kernel: [ 0.000000] Linux version 2.6.38-7-generic (buildd@roseapple) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-6ubuntu4) ) #36-Ubuntu SMP Fri Mar 18 22:05:25 UTC 2011 (Ubuntu 2.6.38-7.36-generic 2.6.38)

Mar 21 11:47:30 eee kernel: [ 1115.992048] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 21 11:47:30 eee kernel: [ 1115.998408] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 110179 at 110177, next 110180)

Apport is not generating a problem popup when I next reboot at the moment.
A small amount of testing with the terminal does not show any corruption which I was seeing 2 weeks ago bug 717114

*** Bug 35608 has been marked as a duplicate of this bug. ***

Created attachment 44880
i915_error_state from #35608

*** Bug 35647 has been marked as a duplicate of this bug. ***

Created attachment 44881
i915_error_state from #35647

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed

*** Bug 36000 has been marked as a duplicate of this bug. ***

Created attachment 45335
i915_error_state from #36000

I suspect that this bug is related to Bug 36147

Test if reverting commit cc930a37612341a1f2457adb339523c215879d82
helps

Bryce, I'm confident that Knut identified the same issue and so disabling relaxed-fencing for the release should fix these as well. (I have lingering doubts since we tried the obvious kernel workarounds, but then again I think we may have a fundamental bug in our allocation ala gen2.) Obviously, if I am wrong, let's open the bug again.

commit 686018f283f1d131073ef5917213e6a8ac013f26
Author: Chris Wilson <email address hidden>
Date: Tue Apr 12 08:23:04 2011 +0100

    Turn relaxed-fencing off by default for older (pre-G33) chipsets

    There are still too many unresolved bugs, typically GPU hangs, that are
    related to using relaxed fencing (i.e. only allocating the minimal
    amount of memory required for a buffer) on older hardware, so turn off
    the feature by default for the release.

    Reported-and-tested-by: Knut Petersen <email address hidden>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36147
    Signed-off-by: Chris Wilson <email address hidden>
    Acked-by: Daniel Vetter <email address hidden>

I can't look too deeply into it right now but it looks like this hasn't fixed it for me. The xf86-video-intel I built definitely included that commit and I was running 2.6.38.2.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released

Reopening, though I'm not sure if Cuirot is the reporter.

Chris, if it does fix, I'd suggest marking dup as resolution.

If we're going to use surnames, it's Le Cuirot please!

I'm not the reporter and I'm not 100% sure that my issue is the same but it is very telling that all these similar bug reports sprung up around the same time.

I would do a bisect but it's my wife's laptop and I haven't found a quick way to reproduce the issue. It usually occurs around 15 minutes into using Chromium. If someone could suggest a reliable way to reproduce it (like a GPU stress tester?) then I'll give it a try.

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed

Still happening on 2.6.39. :(

Created attachment 48884
Use full-fence size for alignment on pre-G33

The complication was that there was a second bug that prevented the original patch from preventing the unalignment of the buffers.

Patch posted for inclusion.

commit e28f87116503f796aba4fb27d81e2c3d81966174
Author: Chris Wilson <email address hidden>
Date: Mon Jul 18 13:11:49 2011 -0700

    drm/i915: Fix unfenced alignment on pre-G33 hardware

    Align unfenced buffers on older hardware to the power-of-two object
    size. The docs suggest that it should be possible to align only to a
    power-of-two tile height, but using the already computed fence size is
    easier and always correct. We also have to make sure that we unbind
    misaligned buffers upon tiling changes.

    In order to prevent a repetition of this bug, we change the interface
    to the alignment computation routines to force the caller to provide
    the requested alignment and size of the GTT binding rather than assume
    the current values on the object.

    Reported-and-tested-by: Sitosfe Wheeler <email address hidden>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36326
    Signed-off-by: Chris Wilson <email address hidden>
    Cc: <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    Signed-off-by: Keith Packard <email address hidden>

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.