[arrandale] GPU lockup render.IPEHR: 0xff4c4c4c

Bug #845376 reported by PSN
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xf86-video-intel
Invalid
High
xserver-xorg-video-intel (Ubuntu)
Fix Released
High
Unassigned

Bug Description

happened after closing gl-117 in 1024x768 mode while default resolution is 1366x768..
resolution failed to switch back to default and hang, had to hard reset, then im shown that gnome-shell started in fallback mode

ProblemType: Crash
DistroRelease: Ubuntu 11.10
Package: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
ProcVersionSignature: Ubuntu 3.0.0-10.16-generic 3.0.4
Uname: Linux 3.0.0-10-generic i686
.tmp.unity.support.test.0:

.tmp.unity.support.test.1:

ApportVersion: 1.22.1-0ubuntu2
Architecture: i386
Chipset: arrandale
CompizPlugins: [core,bailer,detection,composite,opengl,decor,mousepoll,vpswitch,regex,animation,snap,expo,move,compiztoolbox,place,grid,imgpng,gnomecompat,wall,ezoom,workarounds,resize,fade,unitymtgrabhandles,scale,session,unityshell]
CompositorRunning: None
Date: Fri Sep 9 11:23:01 2011
DistUpgraded: Fresh install
DistroCodename: oneiric
DistroVariant: ubuntu
DuplicateSignature: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c Ubuntu 11.10
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GpuHangFrequency: This is the first time
GpuHangReproducibility: I don't know
GpuHangStarted: Today
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 18) (prog-if 00 [VGA controller])
   Subsystem: Acer Incorporated [ALI] Device [1025:0482]
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Beta i386 (20110901)
InterpreterPath: /usr/bin/python2.7
MachineType: Acer Aspire 4738
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-10-generic root=UUID=87efc057-537b-4925-ad54-7972f88dfef5 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6+7ubuntu6
 libdrm2 2.4.26-1ubuntu1
 xserver-xorg-video-intel 2:2.15.901-1ubuntu2
SourcePackage: xserver-xorg-video-intel
Title: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 04/11/2011
dmi.bios.vendor: INSYDE
dmi.bios.version: V1.15
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: JE41_CP
dmi.board.vendor: Acer
dmi.board.version: Base Board Version
dmi.chassis.type: 10
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnINSYDE:bvrV1.15:bd04/11/2011:svnAcer:pnAspire4738:pvrV1.15:rvnAcer:rnJE41_CP:rvrBaseBoardVersion:cvnChassisManufacturer:ct10:cvrChassisVersion:
dmi.product.name: Aspire 4738
dmi.product.version: V1.15
dmi.sys.vendor: Acer
version.compiz: compiz 1:0.9.5.92+bzr2791-0ubuntu2
version.libdrm2: libdrm2 2.4.26-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu3
version.xserver-xorg: xserver-xorg 1:7.6+7ubuntu6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110411+8378443-1

Revision history for this message
PSN (psndna88) wrote :
tags: removed: need-duplicate-check
Revision history for this message
In , Bryce Harrington (bryce) wrote :
Download full text (3.5 KiB)

Forwarding this bug from Ubuntu reporter PSN:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/845376

[Problem]
Yet another GPU lockup, this one with GNOME Shell, and with an unusual error code:
  render.IPEHR: 0xff4c4c4c
Only one we've seen reported with this error code, and so far the user has only experienced it once, on a fresh install.

[Original Description]
happened after closing gl-117 in 1024x768 mode while default resolution is 1366x768..
resolution failed to switch back to default and hang, had to hard reset, then im shown that gnome-shell started in fallback mode

ProblemType: Crash
DistroRelease: Ubuntu 11.10
Package: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
ProcVersionSignature: Ubuntu 3.0.0-10.16-generic 3.0.4
Uname: Linux 3.0.0-10-generic i686
.tmp.unity.support.test.0:

.tmp.unity.support.test.1:

ApportVersion: 1.22.1-0ubuntu2
Architecture: i386
Chipset: arrandale
CompizPlugins: [core,bailer,detection,composite,opengl,decor,mousepoll,vpswitch,regex,animation,snap,expo,move,compiztoolbox,place,grid,imgpng,gnomecompat,wall,ezoom,workarounds,resize,fade,unitymtgrabhandles,scale,session,unityshell]
CompositorRunning: None
Date: Fri Sep 9 11:23:01 2011
DistUpgraded: Fresh install
DistroCodename: oneiric
DistroVariant: ubuntu
DuplicateSignature: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c Ubuntu 11.10
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GpuHangFrequency: This is the first time
GpuHangReproducibility: I don't know
GpuHangStarted: Today
GraphicsCard:
 Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 18) (prog-if 00 [VGA controller])
   Subsystem: Acer Incorporated [ALI] Device [1025:0482]
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Beta i386 (20110901)
InterpreterPath: /usr/bin/python2.7
MachineType: Acer Aspire 4738
ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-10-generic root=UUID=87efc057-537b-4925-ad54-7972f88dfef5 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.6+7ubuntu6
 libdrm2 2.4.26-1ubuntu1
 xserver-xorg-video-intel 2:2.15.901-1ubuntu2
SourcePackage: xserver-xorg-video-intel
Title: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 04/11/2011
dmi.bios.vendor: INSYDE
dmi.bios.version: V1.15
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: JE41_CP
dmi.board.vendor: Acer
dmi.board.version: Base Board Version
dmi.chassis.type: 10
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnINSYDE:bvrV1.15:bd04/11/2011:svnAcer:pnAspire4738:pvrV1.15:rvnAcer:rnJE41_CP:rvrBaseBoardVersion:cvnChassisManufacturer:ct10:cvrChassisVersion:
dmi.product.name: Aspire 4738
dmi.product.version: V1.15
dmi.sys.vendor: Acer
version.compiz: compiz 1:0.9.5.92+bzr2791-0ubuntu2
version.libdrm2: libdrm2 2.4.26-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-...

Read more...

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51483
XorgLog.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51484
i915_error_state.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51485
CurrentDmesg.txt

Revision history for this message
In , Bryce Harrington (bryce) wrote :

Created attachment 51486
BootDmesg.txt

Revision history for this message
Bryce Harrington (bryce) wrote :

PSN - I've forwarded this bug upstream to http://bugs.freedesktop.org/show_bug.cgi?id=41098 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Triaged
Changed in xserver-xorg-video-intel:
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Well, that's no batchbuffer the gpu tries to execute, that's just a pile of rgba pixels. Hence the strange IPEHR code.

No idea where these pixels are coming from. We've only seen this on snb and blamed it on semaphores not correctly syncing the batches.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Well, it was a batchbuffer. It has the cache domain to prove that the last time the CPU saw it was inactive and ready to execute...

With the split rings on SNB, it is much easier to trick the GPU into overwriting memory queued for execution on another ring. On ILK it requires the GPU to overwrite a bo that has already been flushed. One way is userspace could have used an absolute relocation, very unlikely.

I have seen such corrupt batches with very short-lived ddx bugs (e.g. marking cache domains incorrectly or not clipping drawing commands correctly) or tiling issues with pipelined fencing and map_and_fenceable. All of which do not seem to apply here.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

There's something pretty odd with this error_state. The batchbuffer containing the rbgba values has read_domains = GTT | INSTRUCTION | COMMAND | SAMPLER, write_domains = 0.

The first 3 read domains are ok (and should be like that), sampler makes just no sense. We probably don't want to sample a batchbuffer, and the write to turn that bo into something worth sampling (or executing, depending upon ordering) should have invalidated the other domains. This is fishy.

Revision history for this message
In , Chris Wilson (ickle) wrote :

SAMPLER is used for surface binding table in Mesa (or was). And the surface binding table is embedded at the tail of the batchbuffer, so yes it could legitimately be moved to the SAMPLER domain following the batchbuffer pwrite.

Revision history for this message
In , Chris Wilson (ickle) wrote :

My current favourite hypothesis is a wild write from one of the intervening batches, and we have a mix of ddx/dri suspects.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Bug 41102 is similar. That looks like a batch buffer clobbered by BLT?

Revision history for this message
Bryce Harrington (bryce) wrote :

Hi, just checking back in for status. Have you seen more of these lockups since the release or within the last few weeks?

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
PSN (psndna88) wrote : RE: [Bug 845376] [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c
Download full text (4.1 KiB)

It hasnt happened since the final release

Sent from my Nokia phone
-----Original Message-----
From: Bryce Harrington
Sent: 26-10-2011 03:59:39
Subject: [Bug 845376] Re: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c

Hi, just checking back in for status. Have you seen more of these
lockups since the release or within the last few weeks?

** Changed in: xserver-xorg-video-intel (Ubuntu)
       Status: Triaged => Incomplete

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/845376

Title:
  [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c

Status in X.org xf86-video-intel:
  Confirmed
Status in “xserver-xorg-video-intel” package in Ubuntu:
  Incomplete

Bug description:
  happened after closing gl-117 in 1024x768 mode while default resolution is 1366x768..
  resolution failed to switch back to default and hang, had to hard reset, then im shown that gnome-shell started in fallback mode

  ProblemType: Crash
  DistroRelease: Ubuntu 11.10
  Package: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
  ProcVersionSignature: Ubuntu 3.0.0-10.16-generic 3.0.4
  Uname: Linux 3.0.0-10-generic i686
  .tmp.unity.support.test.0:

  .tmp.unity.support.test.1:

  ApportVersion: 1.22.1-0ubuntu2
  Architecture: i386
  Chipset: arrandale
  CompizPlugins: [core,bailer,detection,composite,opengl,decor,mousepoll,vpswitch,regex,animation,snap,expo,move,compiztoolbox,place,grid,imgpng,gnomecompat,wall,ezoom,workarounds,resize,fade,unitymtgrabhandles,scale,session,unityshell]
  CompositorRunning: None
  Date: Fri Sep 9 11:23:01 2011
  DistUpgraded: Fresh install
  DistroCodename: oneiric
  DistroVariant: ubuntu
  DuplicateSignature: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c Ubuntu 11.10
  ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
  GpuHangFrequency: This is the first time
  GpuHangReproducibility: I don't know
  GpuHangStarted: Today
  GraphicsCard:
   Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 18) (prog-if 00 [VGA controller])
     Subsystem: Acer Incorporated [ALI] Device [1025:0482]
  InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Beta i386 (20110901)
  InterpreterPath: /usr/bin/python2.7
  MachineType: Acer Aspire 4738
  ProcCmdline: /usr/bin/python /usr/share/apport/apport-gpu-error-intel.py
  ProcEnviron:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-10-generic root=UUID=87efc057-537b-4925-ad54-7972f88dfef5 ro quiet splash vt.handoff=7
  RelatedPackageVersions:
   xserver-xorg 1:7.6+7ubuntu6
   libdrm2 2.4.26-1ubuntu1
   xserver-xorg-video-intel 2:2.15.901-1ubuntu2
  SourcePackage: xserver-xorg-video-intel
  Title: [arrandale] GPU lockup render.IPEHR: 0xff4c4c4c
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:

  dmi.bios.date: 04/11/2011
  dmi.bios.vendor: INSYDE
  dmi.bios.version: V1.15
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: JE41_CP
  dmi.board.vendor: Acer
  dmi.board.version: Base Board Version
  dmi.chassis.type: 10
  dmi.chassis.vendor: Chassis Manufacturer
  dmi.chassis.version: Chassis Version
  dmi.modalias: dmi:bvnIN...

Read more...

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
In , Chris Wilson (ickle) wrote :

I believe this is related to:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <email address hidden>
Date: Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish

    By clearing the GPU read domains before waiting upon the buffer, we run
    the risk of the wait being interrupted and the domains prematurely
    cleared. The next time we attempt to wait upon the buffer (after
    userspace handles the signal), we believe that the buffer is idle and so
    skip the wait.

    There are a number of bugs across all generations which show signs of an
    overly haste reuse of active buffers.

    Such as:

      https://bugs.freedesktop.org/show_bug.cgi?id=29046
      https://bugs.freedesktop.org/show_bug.cgi?id=35863
      https://bugs.freedesktop.org/show_bug.cgi?id=38952
      https://bugs.freedesktop.org/show_bug.cgi?id=40282
      https://bugs.freedesktop.org/show_bug.cgi?id=41098
      https://bugs.freedesktop.org/show_bug.cgi?id=41102
      https://bugs.freedesktop.org/show_bug.cgi?id=41284
      https://bugs.freedesktop.org/show_bug.cgi?id=42141

    A couple of those pre-date i915_gem_object_finish_gpu(), so may be
    unrelated (such as a wild write from a userspace command buffer), but
    this does look like a convincing cause for most of those bugs.

    Signed-off-by: Chris Wilson <email address hidden>
    Cc: <email address hidden>
    Reviewed-by: Daniel Vetter <email address hidden>
    Reviewed-by: Eugeni Dodonov <email address hidden>
    Signed-off-by: Daniel Vetter <email address hidden>

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

to mark dup to show relationship

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

*** This bug has been marked as a duplicate of bug 29046 ***

Changed in xserver-xorg-video-intel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.