[snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround i915.semaphores=0

Bug #1041790 reported by Rocko on 2012-08-26
922
This bug affects 226 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
In Progress
Medium
linux (Ubuntu)
Low
Unassigned
sandybridge-meta (Ubuntu)
Undecided
Unassigned
xserver-xorg-video-intel (Ubuntu)
High
Unassigned

Bug Description

X locks up periodically for a 2 to ten seconds at a time and this crash log gets generated. It's significantly more than several times a day but not quite continuous. If you indeed have this bug, that should stop the lockups from happening. Irrespective, please file a new bug report so your hardware may be tracked.

WORKAROUND: Edit your /etc/default/grub from:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0"

run the following and reboot:
sudo update-grub

The side effects of this is rendering throughput is dropped by 10% with SNA, or as much as 3x with UXA. OpenGL performance is likely to be reduced by about 30%. More CPU time is spent waiting for the GPU with rc6 disabled, so increased power consumption.

ProblemType: Crash
DistroRelease: Ubuntu 12.10
Package: xserver-xorg-video-intel 2:2.20.3-0ubuntu1
Uname: Linux 3.6.0-rc3-git-20120826.1015 x86_64
ApportVersion: 2.5.1-0ubuntu2
Architecture: amd64
Chipset: sandybridge-m-gt2
Date: Sun Aug 26 16:06:32 2012
DistroCodename: quantal
DistroVariant: ubuntu
DuplicateSignature: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001 Ubuntu 12.10
EcryptfsInUse: Yes
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
GpuHangFrequency: Continuously
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120724.2)
InterpreterPath: /usr/bin/python3.2mu
MachineType: Dell Inc. Dell System XPS L502X
ProcCmdline: /usr/bin/python3 /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.6.0-rc3-git-20120826.1015 root=UUID=135c8090-427c-460a-909d-eff262cd44b6 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 xserver-xorg 1:7.7+1ubuntu3
 libdrm2 2.4.38-0ubuntu2
 xserver-xorg-video-intel 2:2.20.3-0ubuntu1
SourcePackage: xserver-xorg-video-intel
Title: [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001
UdevDb: Error: [Errno 2] No such file or directory: 'udevadm'
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 05/29/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A11
dmi.board.name: 0NJT03
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnDellInc.:bvrA11:bd05/29/2012:svnDellInc.:pnDellSystemXPSL502X:pvr:rvnDellInc.:rn0NJT03:rvrA00:cvnDellInc.:ct8:cvr0.1:
dmi.product.name: Dell System XPS L502X
dmi.sys.vendor: Dell Inc.

Rocko (rockorequin) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
tags: removed: need-duplicate-check
22 comments hidden view all 306 comments

Created attachment 66289
dmesg output

From time to time interface freezes, and in dmesg appear these records: [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer elapsed... blitter ring idle

$ lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b5)
00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
00:1c.4 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 5 (rev b5)
00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation H61 Express Chipset Family LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
02:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)
03:01.0 Multimedia audio controller: VIA Technologies Inc. VT1720/24 [Envy24PT/HT] PCI Multi-Channel Audio Controller (rev 01)
04:00.0 Ethernet controller: Atheros Communications AR8151 v2.0 Gigabit Ethernet (rev c0)
05:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
06:00.0 SATA controller: ASMedia Technology Inc. Device 0612 (rev 01)

21 comments hidden view all 306 comments
Bryce Harrington (bryce) wrote :

Does switching from UXA to SNA help?

Changed in xserver-xorg-video-intel (Ubuntu):
importance: Undecided → Medium
Rocko (rockorequin) wrote :

Ha! I thought SNA was turned on by default, but it isn't, is it. Is it possible to switch between SNA and UXB on when X is running, or to tell which one is being used?

I've turned SNA on via AccelMethod in xorg.conf now, so I'll see if the freezes go away.

Since I restarted X with SNA, the titlebar of windows that don't have the focus change their background to light grey. The window buttons and the title text stay the same, which looks weird. Is that something that can be configured?

Rocko (rockorequin) wrote :

Is SNA turned on by default now? I had a couple of hours freeze-free with it the other day, but removed my xorg.conf shortly afterwards because the white titlebars and glitchy 3D graphics were annoying, and also because with SNA enabled the backlight didn't come on after the screensaver turned it off. But now the titlebars are white again.

Bryce Harrington (bryce) wrote :

SNA is not the default for quantal. No, there is not a way to toggle between UXA and SNA at run time. /var/log/Xorg.0.log is where to look to see which acceleration tech is active.

If I understand your testing feedback, you do believe SNA helps eliminate the freeze behaviors, and thus we can consider UXA the likely source of the bug.

Rocko (rockorequin) wrote :

Yes, I think the bug doesn't happen with SNA whereas it occurs pretty regularly with UXA. I've been using SNA for a couple of days now since it became the default on my system. Does X now look for other xorg.conf files? I created one called /etc/X11/xorg.conf-intel-sna and symlinked to it to test out SNA; then I deleted the symlink, and a day or two later suddenly SNA became the default.

Rocko (rockorequin) wrote :

Ah, I am using xorg-edgers. Perhaps they are trying out SNA as the default there.

Rocko (rockorequin) wrote :

I've been using SNA for a couple of weeks now, and it doesn't seem to suffer from this particular bug.

The bug still occurs in the latest xf86-video-intel driver from git (as of 27/9/12), though. It generally occurs when focus changes, eg when a menu or popup window is opening.

Ursula Junque (ursinha) wrote :

Hi Bryce, I've been getting this error every once in a while and when it happens, apport tries to report the bug like ten times. Let me know if I can provide more information about it.

Cheers,

Ursula Junque (ursinha) wrote :

I've filed another bug with apport and all my files are attached there: bug 1059737, just in case they're not duplicates.

Paul Smedley (paul-smedley) wrote :

Switching from UXA to SNA fixes this for me too, on an Asus Zenbook UX31E

Dimitri John Ledkov (xnox) wrote :

I am hitting this bug. Can somebody please explain how to check if I am using UXA or SNA and how to switch between the two? If SNA helps, and I am using UXA I'd like to try SNA.

Rocko (rockorequin) wrote :

@Dmitrijs: To find which method is being used, do:

grep AccelMethod /var/log/Xorg.0.log

I find also that the titlebars of non-focused windows are often light grey instead of black when using SNA.

And to change methods, put this in your xorg.conf to set the acceleration method and then restart X:

Section "Device"
 Identifier "Card0"
 Driver "intel"
 Option "AccelMethod" "sna" # or uxa, as appropriate
EndSection

11 comments hidden view all 306 comments

If you can easily reproduce this error, can you please build a kernel using http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=xv-overlay which has some revised memory barriers.

Can you help me to build rpm for fedora?

11 comments hidden view all 306 comments
Rocko (rockorequin) wrote :

I still experience this bug, even with the latest intel driver from git, xf86-video-intel-2.6.99.902. I would use SNA but it has an even more annoying bug after the screen saver unlocks where unity just shows me a black screen and mouse cursor, and I have to physically restart unity to get it working again.

12 comments hidden view all 306 comments

On second thoughts, I think this should be fixed by the slight robustification in more recent hangcheck.

Please try the latest kernel for your distribution (should be 3.6.7 atm) and reopen if it still occurs.

I am use Fedora 18 with 3.6.7-5.fc18.i686 kernel and in dmesg output still exists message:
[22826.654365] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22826.654369] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

That is not the same bug, so you need to attach a fresh set of debug info (please remember the i915_error_state)...

Please, explain how get needed debug info. Thanks.

http://intellinuxgraphics.org/how_to_report_bug.html

From which we need the i915_error_state, so

$ sudo mount -tdebugfs debug /sys/kernel/debug
$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state

Created attachment 70518
i915_error_state

Looks that corresponds to the bug

commit 1c8b46fc8c865189f562c9ab163d63863759712f
Author: Chris Wilson <email address hidden>
Date: Wed Nov 14 09:15:14 2012 +0000

    drm/i915: Use LRI to update the semaphore registers

    The bspec was recently updated to remove the ability to update the
    semaphore using the MI_SEMAPHORE_BOX command, the ability to wait upon
    the semaphore value remained. Instead the advice is to update the
    register using the MI_LOAD_REGISTER_IMM command. In cursory testing,
    semaphores continue to function - the question is whether this fixes
    some of the deadlocks where the semaphore registers contained stale
    values?

hopefully addresses.

That patch is only available on drm-intel-next at the moment, which is available either at http://cgit.freedesktop.org/~danvet/drm-intel or available as drm-intel-experimental in the ubuntu kernel-ppa.

Karma Dorje (taaroa) on 2012-11-28
tags: added: raring
17 comments hidden view all 306 comments
Timo Aaltonen (tjaalton) wrote :

I've uploaded -intel 2.20.14 to raring, so please test with both UXA and SNA to see if either or both work.

Rocko: I can't reproduce your bug with SNA (with this new version anyway), works fine on my T420s. 2.6.99.902 sounds old too :)

Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Incomplete
Rocko (rockorequin) wrote :

Yes, I've been running v2.20.14 from git (using SNA, not UXA) for a few days on Quantal and so far I hasn't seen that other bug I mentioned - it hasn't fatally locked up after the screensaver kicks in. However, it has experienced *this* particular bug a few times, ie where the screen locks but I can fix it by switching to a tty terminal and back.

Re 2.6.99.902, I think I probably did a git tag command and looked at the last entry, which is definitely old. I would have been running a pre-v2.20.14 version at the time.

Karma Dorje (taaroa) wrote :

@Timo Aaltonen
SNA — ok. looks like some sort of regression in the driver.

Bryce Harrington (bryce) wrote :

Rocko, thanks for testing the git DDX. Next time you get one of these freezes can you please collect a fresh i915_error_state, dmesg, and Xorg.0.log?

Sounds like this bug should go upstream.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → New
status: New → Incomplete
15 comments hidden view all 306 comments

Problem repeated with patched kernel.

[118637.439016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[118637.439020] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[mikhail@localhost ~]$ uname -a
Linux localhost.localdomain 3.6.9-4.1.fc18.i686.PAE #1 SMP Wed Dec 5 15:16:33 UTC 2012 i686 i686 i386 GNU/Linux
[mikhail@localhost ~]$ sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state
[sudo] password for mikhail:
[mikhail@localhost ~]$

Created attachment 71192
i915_error_state (new)

sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state-8
cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory

What it mean??

Created attachment 71199
i915_error_state (new)

Created attachment 71200
dmesg output (new)

Lalalalala.

19 comments hidden view all 306 comments
bugbot (bugbot) wrote :

We're closing this bug since there has not been a response from the original reporter. However, the issue still exists please feel free to reopen with the requested information. If you're not the original reporter, we'd prefer you file a new bug report.

Some tips:

  * Report X.org bugs via the command: `ubuntu-bug xorg`

  * Test against the latest development Ubuntu. http://cdimage.ubuntu.com/daily-live/
    Bugs marked as affecting the development version tend to get priority attention.

  * The `xdiagnose` utility has functionality for enabling debugging and
    analyzing a few common X problems.

  * Tag your bugs with the Ubuntu versions you have reproduced the issue in.

  * See https://wiki.ubuntu.com/X/Reporting for tips on writing good bug reports.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Expired
Adam Conrad (adconrad) on 2013-01-08
Changed in xserver-xorg-video-intel (Ubuntu):
status: Expired → Confirmed
Timo Aaltonen (tjaalton) on 2013-01-09
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: nobody → Timo Aaltonen (tjaalton)
status: Confirmed → Incomplete
2 comments hidden view all 306 comments
Rocko (rockorequin) wrote :

I've seen it happen with kernel 3.8-rc2 and SNA using the latest intel driver from git.

The hang isn't always the same:

* Sometimes it locks the computer up completely, requiring a hard reboot.

* Sometimes it locks X, but CTRL-ALT-F1 and back unlocks it.

* Sometimes it resolves itself without me even noticing that it has happened, other than that there may be some corruption in the tabs' title text in chrome and window movement has become somewhat jerky instead of the normal smooth movement you get after restarting X.

Next time it happens I'll see if I can recover any information.

Adam Conrad (adconrad) wrote :

Timo: I've never had it completely hang the machine, but I've also not been patient enough to sit around and wait to see if X will eventually recover on its own, I always do a VT switch out and back (and get welcomed by an apport dialog)

Has happened several times today. Will be upgrading to 3.8.0-rc soon to see if that helps, but the comment above me doesn't give much hope.

Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
status: Unknown → Confirmed
Timo Aaltonen (tjaalton) on 2013-01-22
Changed in xserver-xorg-video-intel (Ubuntu):
importance: Medium → High
status: Incomplete → Triaged
Timo Aaltonen (tjaalton) on 2013-01-22
Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → Confirmed
Bryce Harrington (bryce) on 2013-02-04
Changed in xserver-xorg-video-intel (Ubuntu):
status: Confirmed → Triaged
Chris Wilson (ickle) on 2013-02-25
summary: - [sandybridge-m-gt2] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001
+ [snb] GPU lockup IPEHR: 0x0b160001 IPEHR: 0x0b140001, workaround
+ i915.semaphores=0
Bryce Harrington (bryce) on 2013-03-02
Changed in linux (Ubuntu):
importance: Undecided → High
Brad Figg (brad-figg) on 2013-03-02
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in sandybridge-meta (Ubuntu):
status: New → Confirmed
Bryce Harrington (bryce) on 2013-04-05
description: updated
description: updated
description: updated
description: updated
Bryce Harrington (bryce) on 2013-04-05
description: updated
Bryce Harrington (bryce) on 2013-04-05
Changed in linux (Ubuntu):
status: Invalid → New
Brad Figg (brad-figg) on 2013-04-05
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Bryce Harrington (bryce) on 2013-04-22
tags: added: kernel-handoff-graphics
Changed in xserver-xorg-video-intel:
status: Confirmed → Incomplete
Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Alan Pope  (popey) on 2013-09-08
description: updated
description: updated
tags: added: bios-outdated-a12
Changed in linux (Ubuntu):
importance: High → Low
status: Confirmed → Incomplete
tags: added: needs-upstream-testing regression-potential
no longer affects: linuxmint
Changed in xserver-xorg-video-intel:
status: Confirmed → In Progress
Timo Aaltonen (tjaalton) on 2014-08-15
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: Timo Aaltonen (tjaalton) → nobody
242 comments hidden view all 306 comments

Hi Chris,

OK, nothing of the above was the reason. In my case it's simply this:

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
   Identifier "Intel Graphics"
   Driver "intel"
   Option "TearFree" "true"
EndSection

I added it when the tearing scrolling through large webpages annoyed me.
As soon as I added it, the problems quickly started.

Selfmade problem.

Frank

(In reply to comment #189)
> Hi Chris,
>
> OK, nothing of the above was the reason. In my case it's simply this:
>
> /etc/X11/xorg.conf.d/20-intel.conf
>
> Section "Device"
> Identifier "Intel Graphics"
> Driver "intel"
> Option "TearFree" "true"
> EndSection
>
>
> I added it when the tearing scrolling through large webpages annoyed me.
> As soon as I added it, the problems quickly started.
>
> Selfmade problem.

Not really, https://bugs.freedesktop.org/show_bug.cgi?id=70764 tracks that this hang is more likely with TearFree (fundamentally the hang is still the same hardware issue, but it is interesting that TearFree has a higher chance of hitting it).

If you want to experiment:

 http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=requests

should have an interesting fix, at least for trying to prevent the TearFree leading to the semaphore hang.

What information is most useful for these repeating issues, as it just happened again:

 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on render ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on blitter ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140239] [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
 Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.140750] [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on render ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] stuck on blitter ring
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm] GPU HANG: ecode 0:0xf4e9fffe, in Xorg [26353], reason: Ring hung, action: reset
 Sep 16 08:32:59 arrowsmithlap1 kernel: [drm:i915_context_is_banned] *ERROR* gpu hanging too fast, banning!
 Sep 16 08:33:01 arrowsmithlap1 kernel: [1182244.142445] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
 Sep 16 08:33:01 arrowsmithlap1 kernel: [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

The only thing under my /etc/X11/xorg.conf.d/ is 00-keyboard.conf (system generated).

Do you want a copy of /sys/class/drm/card0/error every time?

(In reply to comment #191)
> What information is most useful for these repeating issues, as it just
> happened again:
>
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> render ring
> Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> blitter ring

So long as it is the same event, there is no more information we need other than testing feedback for an eventual workaround.

(In reply to comment #184)
> (In reply to comment #183)
>
> I recommend configuring i915.semaphores=0. I did it and it doesn't freeze
> anymore.

Meanwhile I tested both i915.semaphores=0 and i915.semaphores=1 neither of which did help in my case. But with i915.semaphores=0 my system became much more unstable and even crashed on its own after some days without stress on graphics (just ran some desktop apps like thunar or vlc for music only - no movies). With i915.semaphores=1 the system is at least stable (for some weeks) as long as I don't heavily use desktop applications.

*** Bug 85194 has been marked as a duplicate of this bug. ***

*** Bug 85333 has been marked as a duplicate of this bug. ***

*** Bug 85609 has been marked as a duplicate of this bug. ***

I am also experiencing this, on a Gentoo system running on a ThinkPad T440s. I'm not doing anything related to XBMC, simply using xrandr for multihead. The interesting thing is that DRI works fine on my laptop screen (glxgears reports 60fps, which is the refresh rate of my screen), but breaks when I move a window trying to use DRI (e.g. Chrome, glxgears) to the external monitor connected to the mini Display Port output.

I see this stuff in dmesg:

[ 3561.424762] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3561.424770] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 3561.424772] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 3561.424774] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 3561.424776] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 3561.424778] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 3566.422957] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3571.425143] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring
[ 3575.423680] [drm:ring_stuck] *ERROR* Kicking stuck wait on blitter ring

Seems like the same issue. I'm trying to downgrade X, mesa, et al., to try and get the system back in working order.

*** Bug 79675 has been marked as a duplicate of this bug. ***

*** Bug 85972 has been marked as a duplicate of this bug. ***

*** Bug 86058 has been marked as a duplicate of this bug. ***

For those running Ubuntu, here is a build of a kernel based on 3.17.1 with the patches Chris Willson wants you to test:

- Those patches have other regressions (so be careful to only test your specific issue).

https://dl.dropboxusercontent.com/u/55728161/linux-headers-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb
https://dl.dropboxusercontent.com/u/55728161/linux-image-3.17.1simonickle_3.17.1simonickle-10.00.Custom_amd64.deb

Those kernels are based on: https://bugs.freedesktop.org/show_bug.cgi?id=83677#c35

Beware, don't switch VTs.

I've tryed the mentioned kernel on my Fedora 21 Beta and still hangs after for example Netbeans opens main window for the whole screen.

*** Bug 86437 has been marked as a duplicate of this bug. ***

*** Bug 86765 has been marked as a duplicate of this bug. ***

*** Bug 86836 has been marked as a duplicate of this bug. ***

*** Bug 86925 has been marked as a duplicate of this bug. ***

*** Bug 87710 has been marked as a duplicate of this bug. ***

*** Bug 87776 has been marked as a duplicate of this bug. ***

*** Bug 88541 has been marked as a duplicate of this bug. ***

*** Bug 88626 has been marked as a duplicate of this bug. ***

*** Bug 88723 has been marked as a duplicate of this bug. ***

*** Bug 88789 has been marked as a duplicate of this bug. ***

*** Bug 89078 has been marked as a duplicate of this bug. ***

*** Bug 89299 has been marked as a duplicate of this bug. ***

*** Bug 89570 has been marked as a duplicate of this bug. ***

*** Bug 89671 has been marked as a duplicate of this bug. ***

*** Bug 89774 has been marked as a duplicate of this bug. ***

*** Bug 89771 has been marked as a duplicate of this bug. ***

*** Bug 89981 has been marked as a duplicate of this bug. ***

*** Bug 90106 has been marked as a duplicate of this bug. ***

*** Bug 90146 has been marked as a duplicate of this bug. ***

*** Bug 90271 has been marked as a duplicate of this bug. ***

*** Bug 90473 has been marked as a duplicate of this bug. ***

*** Bug 90835 has been marked as a duplicate of this bug. ***

Chris, you referred me to this bug as I reported

Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck semaphore on render ring

I skimmed through it and it appears that there are some patches to test? But I am not sure which ones these are. Can you or someone else enlighten me?

Also I note that I still use

        Option "AccelMethod" "uxa"

and I have

martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
options i915 modeset=1 i915_enable_rc6=7

thus maximum energy saving. But according to powertop it never enters the highest sleep state anyway.

I will remove the AccelMethod setting now and see whether it helps. If not, I downgrade to 4.1-rc4 for now, as issues have been at least much less frequent with it.

And its really that for me 4.1-rc6 makes things much *worse*. I am typing this after a clean reboot and already got the GPU hang again. It happens about every few minutes. Are you really sure this is the same GPU hang? I didn´t have this before 4.1 kernel?

(In reply to Martin Steigerwald from comment #225)
> Chris, you referred me to this bug as I reported
>
> Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> semaphore on render ring
>
> I skimmed through it and it appears that there are some patches to test? But
> I am not sure which ones these are. Can you or someone else enlighten me?

There's likely a modest improvement in 4.2.

> Also I note that I still use
>
> Option "AccelMethod" "uxa"
>
> and I have
>
> martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> options i915 modeset=1 i915_enable_rc6=7

Fortuitously that dangerous option doesn't do anything for your kernel.

> ffffffff813a4b0e
> thus maximum energy saving. But according to powertop it never enters the
> highest sleep state anyway.
>
> I will remove the AccelMethod setting now and see whether it helps. If not,
> I downgrade to 4.1-rc4 for now, as issues have been at least much less
> frequent with it.

Purely circumstantial.

> And its really that for me 4.1-rc6 makes things much *worse*. I am typing
> this after a clean reboot and already got the GPU hang again. It happens
> about every few minutes. Are you really sure this is the same GPU hang? I
> didn´t have this before 4.1 kernel?

Yes.

(In reply to Chris Wilson from comment #226)
> (In reply to Martin Steigerwald from comment #225)
> > Chris, you referred me to this bug as I reported
> >
> > Bug 90835 - [4.1-rc6] gpu hang: ecode 6:-1:0x00000000, Kicking stuck
> > semaphore on render ring
> >
> > I skimmed through it and it appears that there are some patches to test? But
> > I am not sure which ones these are. Can you or someone else enlighten me?
>
> There's likely a modest improvement in 4.2.

Nice.

> > Also I note that I still use
> >
> > Option "AccelMethod" "uxa"
> >
> > and I have
> >
> > martin@merkaba:~> cat /etc/modprobe.d/i915-kms.conf
> > options i915 modeset=1 i915_enable_rc6=7
>
> Fortuitously that dangerous option doesn't do anything for your kernel.

Well I found out why, I compiled i915 into the kernel it seems, at least I don´t have an i915 module in lsmod. But also i915.i915_enable_rc6=7 on kernel command line does not seem to have any effect. I removed the option.

> > ffffffff813a4b0e
> > thus maximum energy saving. But according to powertop it never enters the
> > highest sleep state anyway.
> >
> > I will remove the AccelMethod setting now and see whether it helps. If not,
> > I downgrade to 4.1-rc4 for now, as issues have been at least much less
> > frequent with it.
>
> Purely circumstantial.

Since using SNA I didn´t see a GPU hang so far. Too early to say for sure, but it seems something in UXA may have triggered it more easily.

*** Bug 91212 has been marked as a duplicate of this bug. ***

Displaying first 40 and last 40 comments. View all 306 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.