[sandybridge-m-gt2+ mesa] GPU lockup IPEHR: 0x445c4000 IPEHR: 0x01000000

Bug #1153587 reported by Stan Schymanski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

 I have been having random hangups for more than a year now, but only since the beginning of last week, I have been getting the GPU hang-up error messages, with or without system hang-ups. This one coincided with one of the complete lock-ups, that could only be resolved by Alt+SysRq-REISUB. I will now try the approach proposed in Comment #81, hoping for the best. I hope that I am not seeing two different issues here. The GPU hang-up error message has been reoccurring without obvious hang-ups in the past few days, while my complete hang-ups have been happening randomly, sometimes with the possibility to reboot as outlined above, sometimes without (only the power button would do) and sometimes the computer turned itself off without any interaction from my side, and without leaving any trace of a crash in the log files...

ProblemType: Crash
DistroRelease: Ubuntu 12.10
Package: xserver-xorg-video-intel 2:2.20.9-0ubuntu2
ProcVersionSignature: Ubuntu 3.5.0-26.40-generic 3.5.7.6
Uname: Linux 3.5.0-26-generic x86_64
ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
Chipset: sandybridge-m-gt2+
Date: Mon Mar 11 14:03:28 2013
DistroCodename: quantal
DistroVariant: ubuntu
DuplicateSignature: [sandybridge-m-gt2+] GPU lockup IPEHR: 0x445c4000 IPEHR: 0x01000000 Ubuntu 12.10
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GpuHangFrequency: Several times a week
InstallationDate: Installed on 2011-06-28 (621 days ago)
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release amd64 (20110427)
InterpreterPath: /usr/bin/python3.2mu
MachineType: Dell Inc. Latitude E6320
MarkForUpload: True
ProcCmdline: /usr/bin/python3 /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=UUID=5083e04c-1bad-44bf-a241-c839914a697a ro crashkernel=384M-2G:64M,2G-:128M quiet splash i915.semaphores=0
RelatedPackageVersions:
 xserver-xorg 1:7.7+1ubuntu4
 libdrm2 2.4.39-0ubuntu1
 xserver-xorg-video-intel 2:2.20.9-0ubuntu2
SourcePackage: xserver-xorg-video-intel
Title: [sandybridge-m-gt2+] GPU lockup IPEHR: 0x445c4000 IPEHR: 0x01000000
UdevDb: Error: [Errno 2] No such file or directory: 'udevadm'
UpgradeStatus: Upgraded to quantal on 2012-10-27 (135 days ago)
UserGroups:

dmi.bios.date: 08/15/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 087HK7
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd08/15/2012:svnDellInc.:pnLatitudeE6320:pvr01:rvnDellInc.:rn087HK7:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E6320
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.

Revision history for this message
Stan Schymanski (schymans) wrote :
tags: removed: need-duplicate-check
Revision history for this message
Chris Wilson (ickle) wrote :

This is a completely separate issue to bug 1041790, as this appears to be a bug in mesa itself.

summary: - [sandybridge-m-gt2+] GPU lockup IPEHR: 0x445c4000 IPEHR: 0x01000000
+ [sandybridge-m-gt2+ mesa] GPU lockup IPEHR: 0x445c4000 IPEHR:
+ 0x01000000
Revision history for this message
Chris Wilson (ickle) wrote :

Actually on closer inspection this a TLB-invalidate bug. Fixed in raring.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Fix Released
Revision history for this message
Stan Schymanski (schymans) wrote :

Dear Chris,

Thanks for your quick responses. I cannot wait until the release of raring. Could you tell me a work-around or do you know what version of Ubuntu/Debian I would need to install to make my system work again? Thanks for your help!

Revision history for this message
Chris Wilson (ickle) wrote :

You need:

commit 7d54a904285b6e780291b91a518267bec5591913
Author: Chris Wilson <email address hidden>
Date: Fri Aug 10 10:18:10 2012 +0100

    drm/i915: Apply post-sync write for pipe control invalidates

and

commit 3ac7831314eba873d60b58718123c503f6961337
Author: Jesse Barnes <email address hidden>
Date: Thu Oct 25 12:15:47 2012 -0700

    drm/i915: PIPE_CONTROL TLB invalidate requires CS stall

which are neither marked for stable, and available in v3.6 and v3.8 respectively i.e. raring.

Revision history for this message
Stan Schymanski (schymans) wrote : Re: [Bug 1153587] Re: [sandybridge-m-gt2+ mesa] GPU lockup IPEHR: 0x445c4000 IPEHR: 0x01000000
Download full text (4.2 KiB)

Thanks, Chris!
Could you also guide me to some instructions how to install these commits?
Sorry, but I haven't dared tinkering with my system at that level yet, but
having no alternatives, I may have to do it. My system is barely usable
right now.
Thanks for your patience!

On Mon, Mar 11, 2013 at 10:52 PM, Chris Wilson <email address hidden>wrote:

> You need:
>
> commit 7d54a904285b6e780291b91a518267bec5591913
> Author: Chris Wilson <email address hidden>
> Date: Fri Aug 10 10:18:10 2012 +0100
>
> drm/i915: Apply post-sync write for pipe control invalidates
>
> and
>
> commit 3ac7831314eba873d60b58718123c503f6961337
> Author: Jesse Barnes <email address hidden>
> Date: Thu Oct 25 12:15:47 2012 -0700
>
> drm/i915: PIPE_CONTROL TLB invalidate requires CS stall
>
> which are neither marked for stable, and available in v3.6 and v3.8
> respectively i.e. raring.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1153587
>
> Title:
> [sandybridge-m-gt2+ mesa] GPU lockup IPEHR: 0x445c4000 IPEHR:
> 0x01000000
>
> Status in “xserver-xorg-video-intel” package in Ubuntu:
> Fix Released
>
> Bug description:
> I have been having random hangups for more than a year now, but only
> since the beginning of last week, I have been getting the GPU hang-up
> error messages, with or without system hang-ups. This one coincided
> with one of the complete lock-ups, that could only be resolved by Alt
> +SysRq-REISUB. I will now try the approach proposed in Comment #81,
> hoping for the best. I hope that I am not seeing two different issues
> here. The GPU hang-up error message has been reoccurring without
> obvious hang-ups in the past few days, while my complete hang-ups have
> been happening randomly, sometimes with the possibility to reboot as
> outlined above, sometimes without (only the power button would do) and
> sometimes the computer turned itself off without any interaction from
> my side, and without leaving any trace of a crash in the log files...
>
> ProblemType: Crash
> DistroRelease: Ubuntu 12.10
> Package: xserver-xorg-video-intel 2:2.20.9-0ubuntu2
> ProcVersionSignature: Ubuntu 3.5.0-26.40-generic 3.5.7.6
> Uname: Linux 3.5.0-26-generic x86_64
> ApportVersion: 2.6.1-0ubuntu10
> Architecture: amd64
> Chipset: sandybridge-m-gt2+
> Date: Mon Mar 11 14:03:28 2013
> DistroCodename: quantal
> DistroVariant: ubuntu
> DuplicateSignature: [sandybridge-m-gt2+] GPU lockup IPEHR: 0x445c4000
> IPEHR: 0x01000000 Ubuntu 12.10
> ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
> GpuHangFrequency: Several times a week
> InstallationDate: Installed on 2011-06-28 (621 days ago)
> InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release amd64
> (20110427)
> InterpreterPath: /usr/bin/python3.2mu
> MachineType: Dell Inc. Latitude E6320
> MarkForUpload: True
> ProcCmdline: /usr/bin/python3 /usr/share/apport/apport-gpu-error-intel.py
> ProcEnviron:
>
> ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic
> root=UUID=5083e04c-1bad-44bf-a241-c839914a697a ro
> crashkernel=384M-2G:64M,2G-:1...

Read more...

Revision history for this message
Chris Wilson (ickle) wrote :

If you install a v3.8 kernel from ppa:mainline you should hopefully get a more stable system until a suitable kernel is in the normal repository.

Revision history for this message
Stan Schymanski (schymans) wrote :

Thanks, Chris. Unfortunately, after installing v3.8.2, the GPU-lockup messages just kept going.

Revision history for this message
Stan Schymanski (schymans) wrote :

Below is an example from my syslog. I just realised that this is a slightly different "GPU hung" message, but the result is the same for me.

Mar 12 23:01:48 sppc26 kernel: [ 0.000000] Linux version 3.8.2-030802-generic (root@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201303031906 SMP Mon Mar 4 00:07:09 UTC 2013
Mar 12 23:01:48 sppc26 kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.8.2-030802-generic root=UUID=5083e04c-1bad-44bf-a241-c839914a697a ro crashkernel=384M-2G:64M,2G-:128M quiet splash i915.i915_enable_rc6=0
.
.
.
Mar 12 23:09:37 sppc26 kernel: [ 505.650930] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 12 23:09:37 sppc26 kernel: [ 505.650935] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 12 23:12:46 sppc26 kernel: [ 695.152169] SysRq : This sysrq operation is disabled.

Revision history for this message
Chris Wilson (ickle) wrote :

Note I do need the /sys/kernel/debug/dri/0/i915_error_state to identify the error.

Revision history for this message
Stan Schymanski (schymans) wrote :

I wish I could provide that. Just had another fatal "GPU hung", but nothing in i915_error_state:

Mar 13 14:11:56 sppc26 kernel: [ 2601.438055] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 13 14:11:56 sppc26 kernel: [ 2601.438060] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 13 14:11:56 sppc26 kernel: [ 2601.454040] [drm:__gen6_gt_force_wake_get] *ERROR* Timed out waiting for forcewake old ack to clear.
Mar 13 14:14:55 sppc26 kernel: [ 2780.589097] SysRq : This sysrq operation is disabled.

BUT:
~$ sudo more /sys/kernel/debug/dri/0/i915_error_state
no error state collected

Any other chance to collect some useful information?

Revision history for this message
Stan Schymanski (schymans) wrote :

As much as I wish to help debug this, I really need my system to work again asap. Do you think it would help to do a fresh install of Ubuntu 12.10 or any other linux version? Thanks again for your help!

Revision history for this message
Stan Schymanski (schymans) wrote :

Just managed to upload one of the apports as a new bug report here: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1154591

Hope the desired information is there.

Revision history for this message
Chris Wilson (ickle) wrote :

That would imply you reboot before inspecting the error state (or even less likely wrote to the i915_error_state in order to clear it). That error message indicates something is wrong with rc6 - the GPU never wakes up.

Revision history for this message
Stan Schymanski (schymans) wrote :

Correct, I had to reboot, as the system became unresponsive. Can I deactiveate rc6 somehow?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.