Ubuntu

[regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Reported by luca on 2013-03-02
This bug affects 396 people
Affects Status Importance Assigned to Milestone
DRI
In Progress
Medium
Linux
Invalid
Medium
linux (Debian)
New
Unknown
linux (Fedora)
Unknown
Unknown
linux (Ubuntu)
Critical
Unassigned
Precise
Critical
Unassigned
Quantal
Critical
Unassigned
Raring
Critical
Unassigned
linux-lts-quantal (Ubuntu)
Critical
Unassigned
Precise
Critical
Unassigned
Quantal
Critical
Unassigned
Raring
Critical
Unassigned
linux-lts-raring (Ubuntu)
Critical
Unassigned
Precise
Critical
Unassigned
mesa (Ubuntu)
Critical
Unassigned
Precise
Critical
Unassigned

Bug Description

I'm getting errors about GPU hangs every minute or so (usually only when using FF and scrolling a webpage or something). I also get an annoying ubuntu dialog saying there is a "system error".

This didn't happen with 3.5.0-24-generic.

Here is the dmesg:
[15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26285.658405] ------------[ cut here ]------------
[26285.658472] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[26285.658474] Hardware name: SATELLITE Z830
[26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c_algo_bit toshiba_bluetooth snd_page_alloc parport_pc mei video mac_hid lpc_ich ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport e1000e ahci libahci [last unloaded: sdhci]
[26285.658537] Pid: 23433, comm: kworker/u:0 Not tainted 3.5.0-26-generic #40-Ubuntu
[26285.658539] Call Trace:
[26285.658549] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[26285.658553] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[26285.658569] [<ffffffffa02d32e6>] gen6_enable_rps+0x706/0x710 [i915]
[26285.658584] [<ffffffffa02bf3f6>] intel_modeset_init_hw+0x66/0xa0 [i915]
[26285.658595] [<ffffffffa02954b4>] i915_reset+0x1a4/0x6e0 [i915]
[26285.658601] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[26285.658612] [<ffffffffa029a943>] i915_error_work_func+0xc3/0x110 [i915]
[26285.658618] [<ffffffff8107097a>] process_one_work+0x12a/0x420
[26285.658629] [<ffffffffa029a880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[26285.658632] [<ffffffff8107152e>] worker_thread+0x12e/0x2f0
[26285.658636] [<ffffffff81071400>] ? manage_workers.isra.26+0x200/0x200
[26285.658640] [<ffffffff81076023>] kthread+0x93/0xa0
[26285.658644] [<ffffffff8168a3e4>] kernel_thread_helper+0x4/0x10
[26285.658649] [<ffffffff81075f90>] ? kthread_freezable_should_stop+0x70/0x70
[26285.658652] [<ffffffff8168a3e0>] ? gs_change+0x13/0x13
[26285.658654] ---[ end trace 59c6162fdfcbffee ]---
[26756.021167] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26756.021426] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26766.014093] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26766.014397] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26932.376233] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26932.376544] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26932.384285] ------------[ cut here ]------------
[26932.384354] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[26932.384356] Hardware name: SATELLITE Z830
[26932.384358] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c_algo_bit toshiba_bluetooth snd_page_alloc parport_pc mei video mac_hid lpc_ich ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport e1000e ahci libahci [last unloaded: sdhci]
[26932.384421] Pid: 24262, comm: kworker/u:2 Tainted: G W 3.5.0-26-generic #40-Ubuntu
[26932.384422] Call Trace:
[26932.384431] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[26932.384436] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[26932.384451] [<ffffffffa02d32e6>] gen6_enable_rps+0x706/0x710 [i915]
[26932.384466] [<ffffffffa02bf3f6>] intel_modeset_init_hw+0x66/0xa0 [i915]
[26932.384476] [<ffffffffa02954b4>] i915_reset+0x1a4/0x6e0 [i915]
[26932.384482] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[26932.384493] [<ffffffffa029a943>] i915_error_work_func+0xc3/0x110 [i915]
[26932.384500] [<ffffffff8107097a>] process_one_work+0x12a/0x420
[26932.384511] [<ffffffffa029a880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[26932.384514] [<ffffffff8107152e>] worker_thread+0x12e/0x2f0
[26932.384517] [<ffffffff81071400>] ? manage_workers.isra.26+0x200/0x200
[26932.384521] [<ffffffff81076023>] kthread+0x93/0xa0
[26932.384526] [<ffffffff8168a3e4>] kernel_thread_helper+0x4/0x10
[26932.384531] [<ffffffff81075f90>] ? kthread_freezable_should_stop+0x70/0x70
[26932.384534] [<ffffffff8168a3e0>] ? gs_change+0x13/0x13
[26932.384536] ---[ end trace 59c6162fdfcbffef ]---

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-26-generic 3.5.0-26.40
ProcVersionSignature: Ubuntu 3.5.0-26.40-generic 3.5.7.6
Uname: Linux 3.5.0-26-generic x86_64
ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: luca 2084 F.... pulseaudio
CheckboxSubmission: f8b82cd9bc23fe075e5068a9824afda5
CheckboxSystem: b1865df84255b8716d3bcc269ff410d1
Date: Sat Mar 2 22:25:14 2013
HibernationDevice: RESUME=UUID=20fe6da8-7d68-4660-953f-6e4ae1d348a7
InstallationDate: Installed on 2012-04-26 (310 days ago)
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
MachineType: TOSHIBA SATELLITE Z830
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-26-generic root=UUID=36929bf3-a158-44d9-a80d-3adac2840fa8 ro quiet splash acpi_backlight=vendor i915.i915_enable_rc6=1 i915.lvds_downclock=1 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-26-generic N/A
 linux-backports-modules-3.5.0-26-generic N/A
 linux-firmware 1.95
SourcePackage: linux
UpgradeStatus: Upgraded to quantal on 2012-10-28 (125 days ago)
dmi.bios.date: 07/31/2012
dmi.bios.vendor: TOSHIBA
dmi.bios.version: Version 1.70
dmi.board.asset.tag: 0000000000
dmi.board.name: Portable PC
dmi.board.vendor: TOSHIBA
dmi.board.version: Version A0
dmi.chassis.asset.tag: 0000000000
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: Version 1.0
dmi.modalias: dmi:bvnTOSHIBA:bvrVersion1.70:bd07/31/2012:svnTOSHIBA:pnSATELLITEZ830:pvrPT22LE-00300GGR:rvnTOSHIBA:rnPortablePC:rvrVersionA0:cvnTOSHIBA:ct10:cvrVersion1.0:
dmi.product.name: SATELLITE Z830
dmi.product.version: PT22LE-00300GGR
dmi.sys.vendor: TOSHIBA

luca (llucax) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

Another note, when these hungs happen, I get graphic corruption (usually in the fonts/text).

luca (llucax) wrote :

kernel 3.5.0-25-generic also seems to work fine.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: regression-update
luca (llucax) wrote :

After I resumed from suspension with kernel 3.5.0-25-generic I got again the annoying dialogs saying there was a GPU hung detected asking me to report a bug that I have no idea where is going, but looking at dmesg I can't see anything strange [1]. How can I see why those dialogs are being open to see if there is something wrong?

I had the annoying dialog several times in a very short period of time, like 10 times in about 5 minutes and then it stopped. After that I suspended and resumed my laptop a couple of times and it didn't happen again so far.

[1] Except for messages like this but I'm getting this since I bought this computer about an year ago and never had those annoying dialog about any GPU hang:
[52682.020386] CPU1: Package power limit notification (total events = 5770)
[52682.020389] CPU3: Package power limit notification (total events = 5769)
[52682.020391] CPU2: Package power limit notification (total events = 5761)
[52682.020393] CPU0: Package power limit notification (total events = 5746)
[52682.021517] CPU3: Package power limit normal
[52682.021520] CPU1: Package power limit normal
[52682.021521] CPU2: Package power limit normal
[52682.021526] CPU0: Package power limit normal

luca (llucax) wrote :

Also, I couldn't see any graphic corruption this last time with kernel 3.5.0-25

Craig McQueen (cmcqueen1975) wrote :

This affects me, but in my case I'm running Ubuntu 12.04, and the problem seems to be with kernel 3.2.0-39. Booting to kernel 3.2.0-38 seems to have fixed it.

1 comments hidden view all 427 comments
Hans (old-man999) wrote :
luca (llucax) wrote :

Seemsto befixed in linux-image-3.5.0-26-generic 3.5.0-26.42

luca (llucax) wrote :

Nope, stil getting it with linux-image-3.5.0-26-generic 3.5.0-26.42

[32861.907463] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[32861.907470] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[32861.911988] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
...
[39199.903510] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[39199.903846] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

The same here, previous kernel 3.5.0-25-generic works without problems, 3.5.0-26.42 hanged just now:

$ dmesg|grep i915
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 1.667363] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1.667367] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1.667652] i915 0000:00:02.0: setting latency timer to 64
[ 1.687950] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[ 2.429882] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 330.684154] i915 0000:00:02.0: power state changed by ACPI to D3
[ 331.826825] i915 0000:00:02.0: power state changed by ACPI to D0
[ 331.826829] i915 0000:00:02.0: power state changed by ACPI to D0
[ 331.826830] i915 0000:00:02.0: setting latency timer to 64
[ 1677.075872] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1677.075876] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Content of i915_error_state attached.

Will disable rc6 and test what happens then.

With rc6 off the hangup happened 2 minutes after booting:

$ dmesg|grep i915
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
[ 0.857239] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.857242] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.857458] i915 0000:00:02.0: setting latency timer to 64
[ 0.877771] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[ 1.619983] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 128.787009] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 128.787013] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 254.699283] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Seems it's time to return to 3.5.0-25-generic.

Laurent (l-perlat) wrote :

Same problem here :

Visual corruptions + "GPU hang" error when scrolling in Firefox with 3.5.0-26.

Everything back to normal on 3.5.0-25 (Linux 3.5.0-25-generic #39-Ubuntu SMP Mon Feb 25 18:26:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

tobyS (tobias-schlitt) wrote :

Beside visual corruptions and hangs I also experience complete system hang ups (no reaction until hard reboot) and occasional kernel panics. I therefore wonder why this report does not receive higher prio?

czigor (czigor) on 2013-03-20
summary: - [regression] 3.5.0-26-generic CPU hangs
+ [regression] 3.5.0-26-generic GPU hangs

Same issue here. Happens very often during firefox usage, but also on other ocasions.

[51278.392895] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[51278.392901] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[51278.397785] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

shuerhaaken (shkn) wrote :

This is really getting annoying, is anybody taking care of this?

czigor (czigor) wrote :

@shkn:
Using 3.5.0-25-generic made my PC usable again. I get an error message only at login.

Timo Aaltonen (tjaalton) wrote :

it's one of these commits (from the quantal kernel), likely the top one since it's happening on sandybridge:

817e8fdee14b05d drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
4c443ec9afe7f6f drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
f534135423c7028 drm/i915: Disable AsyncFlip performance optimisations
c0c1fd8a18479f0 drm/i915: Invalidate the relocation presumed_offsets along the slow path

Changed in linux (Ubuntu):
assignee: nobody → Ubuntu Kernel Team (ubuntu-kernel-team)
importance: Medium → Critical
Changed in linux (Ubuntu Quantal):
importance: Undecided → Critical
status: New → Confirmed
Changed in linux (Ubuntu Precise):
importance: Undecided → Critical
status: New → Confirmed
Timo Aaltonen (tjaalton) wrote :

note that I'm not sure it's affecting raring, maybe not.

Adam Conrad (adconrad) on 2013-03-22
Changed in linux (Ubuntu Precise):
status: Confirmed → Invalid
Changed in linux-lts-quantal (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-quantal (Ubuntu Quantal):
status: New → Invalid
Changed in linux-lts-quantal (Ubuntu Raring):
status: New → Invalid
Changed in linux-lts-quantal (Ubuntu Precise):
importance: Undecided → Critical
Changed in linux (Ubuntu Precise):
importance: Critical → Undecided
tags: added: performing-bisect
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a kernel bisect to identify the exact commit that introduced this regression. However, it would be good to test the latest mainline and a test kernel with commit 817e8fdee14b05d reverted.

The latest mainline kernel can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc3-raring/

Can folks affected by this bug test the v3.9-rc3 kernel?

One thing to note, you will need to install both the linux-image and linux-image-extra .deb packages.

I will also build a Quantal test kernel with commit 817e8fdee14b05d reverted and post a link shortly.

Thanks in advance!

Joseph Salisbury (jsalisbury) wrote :

I built a Quantal test kernel with commit 817e8fdee14b05d reverted. The kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1140716/

Can folks affected by this bug test this kernel and report back if it fixes the issue?

Gard Spreemann (gspreemann) wrote :

@jsalisbury: I could not successfully test the kernel you linked to in comment #22, as it rendered my system unusable. X started at 640x480, there was no working keyboard/mouse, and I could not SSH in.

franglais.125 (santibatista) wrote :

@jsalisbury: Thanks for pointing to this kernel version. I have been able to successfully test kernel v3.9-rc3 on Precise 12.04.2 (I am running with quantal-lts xorg stack).
I have been running on it for ~ 3 hours so far with success. It usually took some time for me to hit this bug on my Dell V131, so some more testing might be required.
I will report back if I hit the bug again. So far so good.

luca (llucax) wrote :

Also initial success for now. Still getting the annoying dialog at startup though (but no signs of GPU hungs in dmesg).

@jsakusbury: I tested your kernel 3.5.0-27-generic #45~lp1140716v1 (from comment #22), it was no improvement for my system. I got two hangups within the first hour (one S3 cycle at 1985), the second one forced me to turn off the system:

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-27-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-27-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.804805] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.804809] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.805030] i915 0000:00:02.0: setting latency timer to 64
[ 0.824988] i915 0000:00:02.0: irq 43 for MSI/MSI-X
[ 1.563280] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 1894.853449] i915 0000:00:02.0: power state changed by ACPI to D3
[ 1896.202702] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1896.202708] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1896.202720] i915 0000:00:02.0: setting latency timer to 64
[ 1984.429241] i915 0000:00:02.0: power state changed by ACPI to D3
[ 1985.767157] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1985.767160] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1985.767168] i915 0000:00:02.0: setting latency timer to 64
[ 2132.278551] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2132.278555] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 3504.895781] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

luca (llucax) wrote :

torsten, maybe you are having a different issue, note that your hang doesn't look like related to rc6 state.

 [51278.397785] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

BTW, my system is still surviving without hangs with the patched 3.5 kernel.

luca (llucax) wrote :

I just had a burst of dialogs informing of non-existent GPU hangs (with kernel 3.5 patched). The GPU hans are not reported in dmesg though, so I don't know where is it getting from. Also no corruption or anything. Seems like the dialog madness is started when an unrelated program crashes. Maybe is just an apport bug? How should I proceed to see what's really going on?

@jsalisbury: I've been running your 3.5.0-27-generic #45~lp1140716v1 for 5 hours and I've already had 3 hangs. No improvement here.

[ 5733.121323] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5733.121330] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5733.124957] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

luca (llucax) wrote :

OK, it took a while but I got the GPU hang finally with kernel3.5.0-27-generic #45~lp1140716v1 :

[22344.085044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22344.085051] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[22344.090106] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

luca (llucax) wrote :

Always happens with firefox, an only with certain sites (consistently).

luca (llucax) wrote :

I got this with a second hang:

[22344.085044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22344.085051] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[22344.090106] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[23652.138382] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[23652.138898] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[23652.146420] ------------[ cut here ]------------
[23652.146491] WARNING: at /home/jsalisbury/bugs/lp1140716/ubuntu-quantal/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[23652.146495] Hardware name: SATELLITE Z830
[23652.146497] Modules linked in: sdhci_pci sdhci snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel arc4 cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm uvcvideo videobuf2_core videodev snd_seq_midi videobuf2_vmalloc videobuf2_memops snd_rawmidi microcode snd_seq_midi_event iwlwifi snd_seq snd_timer snd_seq_device i915 bnep rfcomm mac80211 toshiba_acpi sparse_keymap drm_kms_helper wmi toshiba_bluetooth snd pcspkr bluetooth drm i2c_algo_bit cfg80211 soundcore psmouse mac_hid snd_page_alloc video serio_raw mei lpc_ich parport_pc ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl lp sunrpc parport ahci libahci e1000e [last unloaded: sdhci]
[23652.146578] Pid: 3451, comm: kworker/u:0 Not tainted 3.5.0-27-generic #45~lp1140716v1
[23652.146581] Call Trace:
[23652.146592] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[23652.146599] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[23652.146621] [<ffffffffa03f6316>] gen6_enable_rps+0x706/0x710 [i915]
[23652.146640] [<ffffffffa03e2446>] intel_modeset_init_hw+0x66/0xa0 [i915]
[23652.146655] [<ffffffffa03b84b4>] i915_reset+0x1a4/0x6e0 [i915]
[23652.146663] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[23652.146679] [<ffffffffa03bd943>] i915_error_work_func+0xc3/0x110 [i915]
[23652.146688] [<ffffffff8107098a>] process_one_work+0x12a/0x420
[23652.146701] [<ffffffffa03bd880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[23652.146707] [<ffffffff8107153e>] worker_thread+0x12e/0x2f0
[23652.146712] [<ffffffff81071410>] ? manage_workers.isra.26+0x200/0x200
[23652.146719] [<ffffffff81076033>] kthread+0x93/0xa0
[23652.146726] [<ffffffff8168ab24>] kernel_thread_helper+0x4/0x10
[23652.146732] [<ffffffff81075fa0>] ? kthread_freezable_should_stop+0x70/0x70
[23652.146737] [<ffffffff8168ab20>] ? gs_change+0x13/0x13
[23652.146740] ---[ end trace 2153106cc632835c ]---

Gard Spreemann (gspreemann) wrote :

I'm confused as to where the commits referenced by tjaalton in comment #19 live, but for what it's worth, I seem to have a stable system after applying reverse diffs of the following commits from the linux-3.5.y branch of git://kernel.ubuntu.com/ubuntu/linux.git to the 3.5.0-27.45 sources:

2964148 - drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
899b550 - drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits

Just reverting the first, or using jsalisbury's kernel from comment #22 (ignore my comment #23, I was being an idiot and forgot the modules) gives me a GPU hang and/or graphics corruption within minutes, especially quickly if opening Firefox. After reverting both of the above, I haven't been able to hang the system yet.

Kernel 3.9.0-030900rc3-generic from comment #21 is much more stable for me, no problems so far after 4h of operation.

franglais.125 (santibatista) wrote :

@jsalisbury: After a few days of use and many suspend-resume cycles, I am yet to encounter a problem with kernel 3.9-rc3 (as indicated in comment #21). No problems whatsoever on my Dell v131 (i5 Sandybridge)...

Peter Saunderson (peteasa) wrote :

I got this a lot with Kernel: 3.5.0-26-generic and used a quick workround to avoid the problem: http://askubuntu.com/questions/225356/how-can-i-enable-the-sna-acceleration-method-for-intel-cards-under-ubuntu-12-04

SNA does not seem to have the same issue just UXA. If I have time I can try a new kernel but I spent so much time on this already it may be a few days before I get the time to try the new kernel.

Max Rameau (afrimax-e) wrote :

I had the problem right after updating (not upgrading) on Saturday using 12.04.

I was able to control it by logging into 2D and immediately opening the System Monitor and shutting down the three instances of Ubuntu One (login, synch and launch), because the machine would freeze upon login to Ubuntu One. I then had to shut down zeitgeist-fts, because that would start eating up resourced (upto 300mb of memory at one point).

At that point, I just decided to reinstall into 12.10. I did that and it worked fine for an hour, so I started transfering over my backed up files and logged into Ubuntu One while running the updates. The problems began immediately, including resource use going up to 100% for long periods of time, mainly through the multiplication of the gkts (?) service. It only used 3.8MB at a time, but at one point there were 20 instances of it open. I concluded it was Ubuntu One causing the problem, so I reinstalled again, this time not logging into Ubuntu One. No problems for 4 hours, even as I installed software. Then I ran the automatic software update, and the problems began again immediately.

Constant crashing, crazy graphic corruption and other issues. Ran the system log and got the similar error:

kernel [224.243459] [drm: Enable RC6 States: RC6 off, RC6p off, RC6p off]
kernal [246.465377] [drm: i95_hangcheck_hung] *ERROR* Hangcheck timer elapsed GPU hung

etc., etc.

This is a nightmare. Need a fix.

Matthew Eaton (powder) wrote :

Test kernel did not fix the issue for me.

Linux matt-work 3.5.0-27-generic #45~lp1140716v1 SMP Fri Mar 22 15:50:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Mar 25 08:12:15 matt-work kernel: [ 158.302349] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 25 08:12:15 matt-work kernel: [ 158.302353] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 25 08:12:15 matt-work kernel: [ 158.305230] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
Mar 25 08:12:36 matt-work kernel: [ 179.663557] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 25 08:12:36 matt-work kernel: [ 179.663780] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off

Joseph Salisbury (jsalisbury) wrote :

Thanks, everyone for testing. So it sounds like my test kernel did not fix this bug. However, it sounds like this bug is fixed in the v3.9 mainline kernel, at least in rc3.

I can perform a "Reverse" kernel bisect to identify the commit that fixes this bug. It will first require us to identify the first v3.9 release candidate that does not exhibit this bug.

We know that it is fixed in rc3, so it would be good to test rc1 and rc2. Can folks affected by this bug test those two release candidates:

v3.9-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc1-raring/
v3.9-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc2-raring/

Matthew Eaton (powder) wrote :

I've been on the rc1 kernel for about 3 hours with no problem.

Linux matt-work 3.9.0-030900rc1-generic #201303060659 SMP Wed Mar 6 12:00:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tags: added: kernel-key
Robert Hooker (sarvatt) on 2013-04-02
Changed in linux (Ubuntu Precise):
status: Invalid → Confirmed
importance: Undecided → Critical
Changed in linux (Ubuntu Raring):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → nobody
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Robert Hooker (sarvatt) on 2013-04-02
summary: - [regression] 3.5.0-26-generic GPU hangs
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
Robert Hooker (sarvatt) on 2013-04-02
Changed in linux (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux (Ubuntu Quantal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Precise):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Raring):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
tags: added: apport-collected
Aymeric PETIT (mulx) on 2013-04-09
tags: added: precise
tags: removed: kernel-key
tags: removed: performing-bisect
Tim Gardner (timg-tpi) on 2013-04-09
Changed in linux (Ubuntu Quantal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Precise):
status: Confirmed → Fix Committed
Steve Conklin (sconklin) on 2013-04-15
tags: added: verification-needed-precise
tags: added: verification-needed-quantal
tags: added: verification-done-precise
removed: verification-needed-precise
tags: added: verification-done-quantal
removed: verification-needed-quantal
tags: added: verification-failed-precise
removed: verification-done-precise
tags: added: verification-done-precise
removed: verification-failed-precise
Sergio (sergio-otero) on 2013-04-23
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2013-04-23
Changed in linux (Ubuntu Precise):
status: Fix Released → Fix Committed
Changed in linux (Ubuntu Raring):
status: Invalid → New
Changed in linux (Ubuntu Raring):
status: New → Confirmed
tags: added: kernel-da-key
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in linux-lts-quantal (Ubuntu Precise):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Raring):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: kernel-stable-key
Changed in linux (Debian):
status: Unknown → New
Changed in linux:
importance: Unknown → Medium
status: Unknown → Invalid
Changed in dri:
importance: Unknown → Medium
status: Unknown → Confirmed
no longer affects: linux-lts-raring (Ubuntu Quantal)
no longer affects: linux-lts-raring (Ubuntu Raring)
tags: added: patch
Changed in linux-lts-raring (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-raring (Ubuntu):
status: New → Confirmed
gokul (gokulnathonline) on 2013-10-08
information type: Public → Public Security
information type: Public Security → Public
348 comments hidden view all 427 comments

Hello. Same problem here.

[ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 485.443467] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0xa637000 ctx 1) at 0xa6371c8
[ 821.726799] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 821.726873] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4974000 ctx 1) at 0x49741c8
[ 1311.134514] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[ 1311.134613] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x4a98000 ctx 1) at 0x4a98220

sys: fedora 19 64b
Linux jarvis 3.11.2-201.fc19.x86_64 #1 SMP Fri Sep 27 19:20:55 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

WM: KDE with effects enabled

8G ram
300G SATA HDD
ntb Lenovo ThinkPad E320

problem occurs in:
- scrolling in firefox
- playing video in vlc and switch to KDE terminal or another app
- sometimes system hangs, cpu 100%, freeze and hard reboot needed
- sometimes happens if I work with ff or in terminal only (very frustrating)
- happening across many kernel versions 3.0 to newest I think

lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b4)
00:1c.1 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 2 (rev b4)
00:1c.2 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 3 (rev b4)
00:1c.5 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 6 (rev b4)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM65 Express Chipset Family LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 04)
02:00.0 Network controller: Intel Corporation Centrino Wireless-N 1000 [Condor Peak]
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
03:00.1 SD Host controller: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
08:00.0 Ethernet controller: Qualcomm Atheros AR8151 v2.0 Gigabit Ethernet (rev c0)

(In reply to comment #110)
> Hello. Same problem here.
>
> [ 485.443455] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [ 485.443467] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [ 485.452727] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0xa637000 ctx 1) at 0xa6371c8

Unlikey that this is the same gpu hang. Please file a new bug report and attach the error state.

1 comments hidden view all 427 comments
Leonid Evdokimov (darkk) wrote :

Once again on 3.8.0-31-generic on raring
dmesg, i915_error_state and Xorg.0.log are at http://yadi.sk/d/psMKfJZi9cfoa under 2013-10-10 subfolder

2 comments hidden view all 427 comments
Mathias Dietrich (theghost) wrote :

FYI, this also affects Saucy with 3.8.0-30-generic and Mesa 9.2.1 and Intel DRI 2.99.904.
Also I observed that with switching to Mesa 9.2 the number of lockups highly increased.
Additionally with newer drivers installed, there are now complete system lockups anymore.
Just the VT switching, which is still very annoying in games.

tags: added: saucy
1 comments hidden view all 427 comments

Just a few remarks.
I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups highly increased (especially in games).
Additionally with running the latest drivers complete system lockups are gone, but it's still a lockup for multiple seconds with following VT switching.
Maybe these observations help somehow.

(In reply to comment #112)
> Just a few remarks.
> I still see this bug with Kernel 3.8, Mesa 9.2.1 and DRI 2.99.904.
> Moreover, with switching from Mesa 9.1.x to Mesa 9.2.x the number of lockups
> highly increased (especially in games).

On snb the blorp engine in mesa has become a bit more hang-happy, see bug #70151
Not all gpu hangs are created equal ;-)

> Additionally with running the latest drivers complete system lockups are
> gone, but it's still a lockup for multiple seconds with following VT
> switching.

You mean a gpu hang happens while when doing a vt switch?

(In reply to comment #113)
> On snb the blorp engine in mesa has become a bit more hang-happy, see bug
> #70151
> Not all gpu hangs are created equal ;-)
>

Actually it was on Sandybridge.

> You mean a gpu hang happens while when doing a vt switch?

No I meant, if you suffer a lockup you just have to wait a few seconds and switch to another VT and back, then you can resume with your system (although sometimes fonts are broken).

2 comments hidden view all 427 comments
Peter Silva (peter-bsqt) wrote :

On Saucy. When I run minecraft, after a few minutes, it locks up the screen.
Ctrl-Alt-F1, to get to a tty... I see:

Hang check elapsed *ERROR* stuck on render ring.
render ring stuck inside bo (0xaf4d000 ctx 1) at 0xaf4d1d8

this happens every couple of minutes...

The crash detection happens, and it tries to report, over and over again, but the report never succeeds. I do not think the reports are getting through. fwiw. I am patched upto today on launch day
 (2013/10/17) and it still happens.

3 comments hidden view all 427 comments

Created attachment 87857
i915_error_state

I also met this bug while I was watching video in mplayer. It every 1-2 hours.

[40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[40787.765852] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x1fb63000 ctx 1) at 0x1fb63220

Created attachment 87858
X -version output

(In reply to comment #115)
> Created attachment 87857 [details]
> i915_error_state
>
> I also met this bug while I was watching video in mplayer. It every 1-2
> hours.
>
> [40787.765816] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
> [40787.765852] [drm] capturing error event; look for more information in
> /sys/kernel/debug/dri/0/i915_error_state
> [40787.772361] [drm:i915_set_reset_status] *ERROR* render ring hung inside
> bo (0x1fb63000 ctx 1) at 0x1fb63220

This looks like bug #70151, but is definitely not this bug here.

Changed in linux-lts-quantal (Ubuntu):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Quantal):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Raring):
importance: Undecided → Critical
Changed in linux-lts-raring (Ubuntu):
importance: Undecided → Critical
Changed in linux-lts-raring (Ubuntu Precise):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Precise):
status: Fix Released → Invalid
Changed in linux (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux (Ubuntu Quantal):
status: Fix Released → Invalid
whiskers75 (whiskers75) wrote :

Happens here while gaming or other graphics-intensive tasks. Lenovo G570.

Changed in linux (Ubuntu Precise):
status: Fix Released → Invalid
Changed in linux-lts-raring (Ubuntu):
status: Confirmed → Triaged
Changed in linux-lts-raring (Ubuntu):
status: Triaged → Invalid
Changed in linux-lts-raring (Ubuntu Precise):
status: Confirmed → Invalid
Changed in mesa (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Changed in linux (Ubuntu Precise):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in linux (Ubuntu Quantal):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in linux (Ubuntu Raring):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in mesa (Ubuntu Precise):
status: New → Triaged
importance: Undecided → Critical

Since this bug:

- Is invalid for Linux upstream, it is also downstream.
- Is confirmed for DRI upstream, the real affected package is "mesa (Ubuntu)".

Mathias Dietrich (theghost) wrote :

Correct me if I am wrong, but if it's an Intel DRI bug (and thats what it is), wouldn't it mean that "xserver-xorg-video-intel" is the "real affected" package ?

Chris Wilson (ickle) wrote :

No, the original issue (still unresolved) here is in the hardware, which makes it a kernel problem. However, there are lots of *different* bugs that have been also reported here that are due to regressions in mesa/i965.

The attachment "0001-drm-i915-Fix-gen6-SNB-missed-BLT-ring-interrupts.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

Created attachment 89314
i915_error_state (kernel 3.11.6, mesa 9.2.2, xf86-video-intel 2.99.906)

GPU hangs after playing hedgewars for a few minutes. Thinkpad T420 laptop, i5-2520M.
dmesg error message:
[16901.286432] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16901.286441] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring
[16901.286444] [drm] capturing error event; look for more information in /sys/kernel/debug/dri/0/i915_error_state
[16908.287504] [drm:i915_hangcheck_elapsed] *ERROR* stuck on render ring
[16908.287508] [drm:i915_hangcheck_elapsed] *ERROR* stuck on blitter ring

*** Bug 71890 has been marked as a duplicate of this bug. ***

*** Bug 72048 has been marked as a duplicate of this bug. ***

*** Bug 72829 has been marked as a duplicate of this bug. ***

*** Bug 73659 has been marked as a duplicate of this bug. ***

Created attachment 92710
i915_error_state

I'm also getting regular Sandybridge GPU lockups with Mesa 10.0.1 and Linux kernel 3.13.

dmesg output:

[ 918.876872] [drm] stuck on render ring
[ 918.876876] [drm] stuck on blitter ring
[ 918.876878] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 918.876879] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 918.876879] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 918.876880] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 918.876880] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 932.923240] [drm] stuck on render ring
[ 932.923242] [drm] stuck on blitter ring

Unfortunately the crash dump doesn't help - it's an empty file!

*** Bug 74180 has been marked as a duplicate of this bug. ***

*** Bug 74265 has been marked as a duplicate of this bug. ***

*** Bug 74452 has been marked as a duplicate of this bug. ***

*** Bug 74473 has been marked as a duplicate of this bug. ***

*** Bug 74867 has been marked as a duplicate of this bug. ***

*** Bug 75163 has been marked as a duplicate of this bug. ***

Created attachment 95090
Another version of the same hang - directed here from bug 75502

*** Bug 75999 has been marked as a duplicate of this bug. ***

Changed in dri:
status: Confirmed → In Progress

*** Bug 76408 has been marked as a duplicate of this bug. ***

*** Bug 76677 has been marked as a duplicate of this bug. ***

*** Bug 76801 has been marked as a duplicate of this bug. ***

For what its worth, running 3.13.7 greatly mitigates this bug, to where the dead time is barely noticeable. It happened three times in short order here and I didn't notice any of them:

[ 4562.551141] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4582.530028] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring
[ 4633.476199] [drm:ring_stuck] *ERROR* Kicking stuck semaphore on render ring

*** Bug 77043 has been marked as a duplicate of this bug. ***

*** Bug 77058 has been marked as a duplicate of this bug. ***

My stuck ring faults are completely gone with i915.i915_enable_rc6=0. Fan stays on a bit more (subjectively) seems to be the only side effect. HP Pavilion dv6 (Sandybridge).

Oh that's interesting. We might be able to find a register to prevent rc6 whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just frob forcewake directly.)

(In reply to comment #139)
> Oh that's interesting. We might be able to find a register to prevent rc6
> whilst waiting on a semaphore. (Hmm, too bad it isn't ivb or we could just
> frob forcewake directly.)

Happy to test patches. I'm updating to 3.13.9 tonight. I could add something on top if you have ideas. If you need more info than my attachment to #76801 just let me know.

*** Bug 77147 has been marked as a duplicate of this bug. ***

Displaying first 40 and last 40 comments. View all 427 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.