[ivb] GPU lockup IPEHR: 0xffffffff upon context restore

Bug #1152165 reported by Charles Lease
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Invalid
Medium
xserver-xorg-video-intel (Ubuntu)
Fix Released
Low
Unassigned

Bug Description

Had SuperTuxKart open and Chromium running in the background; then computer locked up.

ProblemType: Crash
DistroRelease: Ubuntu 13.04
Package: xserver-xorg-video-intel 2:2.21.3-0ubuntu1
ProcVersionSignature: Ubuntu 3.8.0-11.20-generic 3.8.2
Uname: Linux 3.8.0-11-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 2.9-0ubuntu2
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
CompositorUnredirectDriverBlacklist: '(nouveau|Intel).*Mesa 8.0'
CompositorUnredirectFSW: true
Date: Thu Mar 7 08:09:00 2013
DistUpgraded: Fresh install
DistroCodename: raring
DistroVariant: ubuntu
DkmsStatus:
 virtualbox, 4.1.22, 3.8.0-11-generic, x86_64: installed
 virtualbox, 4.1.22, 3.8.0-4-generic, x86_64: installed
 virtualbox, 4.1.22, 3.8.0-6-generic, x86_64: installed
 virtualbox, 4.1.22, 3.8.0-7-generic, x86_64: installed
 virtualbox, 4.1.22, 3.8.0-8-generic, x86_64: installed
DuplicateSignature: GPU lockup IPEHR: 0xffffffff IPEHR: 0x0b140001 Ubuntu 13.04
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
ExtraDebuggingInterest: Yes, if not too technical
GpuHangFrequency: This is the first time
GraphicsCard:
 Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0166] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Apple Inc. Device [106b:00fd]
InstallationDate: Installed on 2013-01-18 (48 days ago)
InstallationMedia: Ubuntu 13.04 "Raring Ringtail" - Alpha amd64+mac (20130117)
InterpreterPath: /usr/bin/python3.3
MachineType: Apple Inc. MacBookAir5,1
MarkForUpload: True
ProcCmdline: /usr/bin/python3 /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-11-generic root=UUID=ceb00db8-a161-4759-80d9-a9fce83fe29f ro i915.i915_enable_rc6=1
RelatedPackageVersions:
 xserver-xorg 1:7.7+1ubuntu4
 libdrm2 2.4.42-0ubuntu1
 xserver-xorg-video-intel 2:2.21.3-0ubuntu1
SourcePackage: xserver-xorg-video-intel
Title: GPU lockup IPEHR: 0xffffffff IPEHR: 0x0b140001
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 11/27/2012
dmi.bios.vendor: Apple Inc.
dmi.bios.version: MBA51.88Z.00EF.B02.1211271028
dmi.board.asset.tag: Base Board Asset Tag#
dmi.board.name: Mac-66F35F19FE2A0D05
dmi.board.vendor: Apple Inc.
dmi.board.version: MacBookAir5,1
dmi.chassis.type: 10
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-66F35F19FE2A0D05
dmi.modalias: dmi:bvnAppleInc.:bvrMBA51.88Z.00EF.B02.1211271028:bd11/27/2012:svnAppleInc.:pnMacBookAir5,1:pvr1.0:rvnAppleInc.:rnMac-66F35F19FE2A0D05:rvrMacBookAir5,1:cvnAppleInc.:ct10:cvrMac-66F35F19FE2A0D05:
dmi.product.name: MacBookAir5,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.
version.compiz: compiz 1:0.9.9~daily13.03.06-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.42-0ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.0.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.0.2-0ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.13.2-0ubuntu3
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.1.0-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.21.3-0ubuntu1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.6-0ubuntu3
xserver.bootTime: Wed Mar 6 11:28:46 2013
xserver.configfile: default
xserver.errors:
 intel(0): Detected a hung GPU, disabling acceleration.
 intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.
xserver.logfile: /var/log/Xorg.0.log
xserver.version: 2:1.13.2-0ubuntu2
xserver.video_driver: intel

Revision history for this message
Charles Lease (mellowchuck-y) wrote :
tags: removed: need-duplicate-check
Revision history for this message
Chris Wilson (ickle) wrote :

Garbage upon context restore.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
summary: - GPU lockup IPEHR: 0xffffffff IPEHR: 0x0b140001
+ [ivb] GPU lockup IPEHR: 0xffffffff upon context restore
Revision history for this message
Chris Wilson (ickle) wrote :

If you do see this regularly and want to test something, please try applying this to your kernel:

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a1e8ecb..b9b8917 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -315,7 +315,7 @@ mi_set_context(struct intel_ring_buffer *ring,
   * explicitly, so we rely on the value at ring init, stored in
   * itlb_before_ctx_switch.
   */
- if (IS_GEN6(ring->dev) && ring->itlb_before_ctx_switch) {
+ if (ring->itlb_before_ctx_switch) {
   ret = ring->flush(ring, I915_GEM_GPU_DOMAINS, 0);
   if (ret)
    return ret;

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

System environment:
-- chipset: I'm not sure
-- system architecture: 64-bit
-- xf86-video-intel: 2.21.6
-- xserver: 1.14.0
-- mesa: 9.1.1
-- libdrm: 2.4.4
-- kernel: 3.9.0-rc8
-- Linux distribution: Fedora 19
-- Machine or mobo model: Intel i3570K with HD4000
-- Display connector: VGA, single display

Reproducing steps:
From within SuperTuxKart, ensure that resolution is set to 1920x1080,
graphics detail level is set to 7 and Vertical Sync is set to ON,
Use Framebuffer Objects is set to ON, and full screen is set to ON.
About 10% of the time, when running the game with these settings, the
GPU hangs and occasionally after that Xorg crashes.
The hang can happen in the menu system or within the actual game.
When the hang occurs, the graphics may update very slowly or periodically
and then eventually it stops updating at all.

Additional info:
I'm not 100% sure, but I think it may have started after setting
Vertical Sync to ON.
After one of the crashes, I managed to capture the system log, the
Xorg log, and the i915_error_state. Unfortunately I was not running
with drm.debug=14.

Package versions:
Up to date Fedora 19 with:
mesa-libGL-9.1.1-1.fc19.x86_64
kernel-3.9.0-0.rc8.git0.2.fc19.x86_64
libdrm-2.4.44-2.fc19.x86_64
xorg-x11-server-Xorg-1.14.0-6.fc19.x86_64
xorg-x11-drv-intel-2.21.6-1.fc19.x86_64
supertuxkart-data-0.7.3-5.fc19.noarch
supertuxkart-0.7.3-5.fc19.x86_64

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 78629
i915 error state

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 78630
System log

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 78631
Xorg log

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 78632
lspci -nn

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 78633
glxinfo

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 78634
cat /proc/cpuinfo

Revision history for this message
In , Chris Wilson (ickle) wrote :

This will be interesting to see if

commit 4615d4c9e27eda42c3e965f208a4b4065841498c
Author: Chris Wilson <email address hidden>
Date: Mon Apr 8 14:28:40 2013 +0100

    drm/i915: Use MLC (l3$) for context objects

has any impact. Can you please try the current drm-intel-nightly kernel from ppa:mainline?

Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

(In reply to comment #7)
> This will be interesting to see if
>
> commit 4615d4c9e27eda42c3e965f208a4b4065841498c
> Author: Chris Wilson <email address hidden>
> Date: Mon Apr 8 14:28:40 2013 +0100
>
> drm/i915: Use MLC (l3$) for context objects
>
> has any impact. Can you please try the current drm-intel-nightly kernel from
> ppa:mainline?

Yes, it seems to work well with the drm-intel-nightly kernel.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Can you please check whether cherry-picking the referenced patch to a stable kernel fixes the issues, too?

Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

(In reply to comment #9)
> Can you please check whether cherry-picking the referenced patch to a stable
> kernel fixes the issues, too?

Yes, applying it on top of the Ubuntu 3.8 kernel worked fine.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

I've just sent out the stable backport request, so this should get fixed in the next stable kernel releases (or one of the next, around the merge window there's a bit a lag usually due to the high patch load).

Thanks for reporting this issue and please reopen if it breaks again.

Changed in xserver-xorg-video-intel:
status: Incomplete → Fix Released
Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

On the same machine, I tried running SuperTuxKart on Arch Linux with Linux kernel 3.10-rc2, mesa 9.1.2 and Intel drivers 2.21.6.

I seemed to get the same hang, even though 3.10-rc2 contains the above-mentioned commit. I will attach the error state and relevant dmesg log.

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 79626
i915 error state from v3.10-rc2

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

Created attachment 79627
System log from v3.10-rc2

Revision history for this message
In , Chris Wilson (ickle) wrote :

Aye, that appears to be same hang.

Changed in xserver-xorg-video-intel:
status: Fix Released → Confirmed
Revision history for this message
In , Chris Wilson (ickle) wrote :

Note that with IVB and MSAA I see lots of corruption with large swaths of memory being overwritten with pixel values (lots of 0xffffffff especially). That would include the possibility of overwritting context memory. Isolating MSAA in mesa would be tricky... perhaps a hack to disable?

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

I can confirm that I did see strange white corruption when playing the
game, but I thought it was unrelated or an application error.

Unfortunately, I don't have access to the hardware anymore so I cannot
further test anything. However, given that the hangs happened on two
different OSes, with the latest kernel versions, it should be easy
enough to reproduce.

Revision history for this message
In , bwidawsk (bwidawsk) wrote :

(In reply to comment #17)
> I can confirm that I did see strange white corruption when playing the
> game, but I thought it was unrelated or an application error.
>
> Unfortunately, I don't have access to the hardware anymore so I cannot
> further test anything. However, given that the hangs happened on two
> different OSes, with the latest kernel versions, it should be easy
> enough to reproduce.

If someone can reproduce this, can they read back register 0x20f4?

Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

On Wed, Jun 26, 2013 at 11:04:43PM +0000, <email address hidden> wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=64073
>
> --- Comment #18 from Ben Widawsky <email address hidden> ---
> If someone can reproduce this, can they read back register 0x20f4?
>

Would that not be in the error state dump I attached?

Revision history for this message
In , Chris Wilson (ickle) wrote :
Revision history for this message
In , bwidawsk (bwidawsk) wrote :

(In reply to comment #19)
> On Wed, Jun 26, 2013 at 11:04:43PM +0000, <email address hidden>
> wrote:
> > https://bugs.freedesktop.org/show_bug.cgi?id=64073
> >
> > --- Comment #18 from Ben Widawsky <email address hidden> ---
> > If someone can reproduce this, can they read back register 0x20f4?
> >
>
> Would that not be in the error state dump I attached?

No. But I can no longer remember what I wanted anyway.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

(In reply to comment #20)
> Hopefully https://patchwork.kernel.org/patch/2841344/ is the right fix.

Can you please test the above patch?

Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Revision history for this message
In , Ross Lagerwall (rosslagerwall) wrote :

(In reply to comment #22)
> (In reply to comment #20)
> > Hopefully https://patchwork.kernel.org/patch/2841344/ is the right fix.
>
> Can you please test the above patch?

Unfortunately, as I said in comment #17, I don't have access to the IVB hardware anymore so I can't test the patch.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Hw no longer available for testing, so closing.

Changed in xserver-xorg-video-intel:
status: Incomplete → Invalid
Revision history for this message
penalvch (penalvch) wrote :

Charles Lease, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p xserver-xorg-video-intel REPLACE-WITH-BUG-NUMBER

Please note, given that the information from the prior release is already available, doing this on a release prior to the development one would not be helpful.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
Chris Wilson (ickle) wrote :

Ultimately this turned out to be mesa bug, fixed around 10.1.3 or so.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.