[regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on Sandybridge

Bug #1140716 reported by luca
This bug affects 393 people
Affects Status Importance Assigned to Milestone
DRI
Won't Fix
Medium
Mesa
Unknown
Unknown
linux (Debian)
Fix Released
Unknown
linux (Fedora)
Won't Fix
Undecided
linux (Ubuntu)
Invalid
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned
Quantal
Invalid
Critical
Unassigned
Raring
Invalid
Critical
Unassigned
linux-lts-quantal (Ubuntu)
Invalid
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned
Quantal
Invalid
Critical
Unassigned
Raring
Invalid
Critical
Unassigned
linux-lts-raring (Ubuntu)
Invalid
Critical
Unassigned
Precise
Invalid
Critical
Unassigned
mesa (Ubuntu)
Fix Released
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned

Bug Description

I'm getting errors about GPU hangs every minute or so (usually only when using FF and scrolling a webpage or something). I also get an annoying ubuntu dialog saying there is a "system error".

This didn't happen with 3.5.0-24-generic.

https://usapillspharma.com

Here is the dmesg:
[15169.033709] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15169.034517] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[15628.480216] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15628.480570] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[15844.231372] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[15844.231773] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[20173.232593] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[20173.233211] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26285.650393] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hunghttps://usapillspharma.com/
[26285.650980] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26285.658405] ------------[ cut here ]------------
[26285.658472] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[26285.658474] Hardware name: SATELLITE Z830
[26285.658476] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c_algo_bit toshiba_bluetooth snd_page_alloc parport_pc mei video mac_hid lpc_ich ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport e1000e ahci libahci [last unloaded: sdhci]https://yourrxpills.com/
[26285.658537] Pid: 23433, comm: kworker/u:0 Not tainted 3.5.0-26-generic #40-Ubuntu
[26285.658539] Call Trace:
[26285.658549] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[26285.658553] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[26285.658569] [<ffffffffa02d32e6>] gen6_enable_rps+0x706/0x710 [i915]
[26285.658584] [<ffffffffa02bf3f6>] intel_modeset_init_hw+0x66/0xa0 [i915]
[26285.658595] [<ffffffffa02954b4>] i915_reset+0x1a4/0x6e0 [i915]
[26285.658601] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[26285.658612] [<ffffffffa029a943>] i915_error_work_func+0xc3/0x110 [i915]
[26285.658618] [<ffffffff8107097a>] process_one_work+0x12a/0x420
[26285.658629] [<ffffffffa029a880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[26285.658632] [<ffffffff8107152e>] worker_thread+0x12e/0x2f0
[26285.658636] [<ffffffff81071400>] ? manage_workers.isra.26+0x200/0x200
[26285.658640] [<ffffffff81076023>] kthread+0x93/0xa0
[26285.658644] [<ffffffff8168a3e4>] kernel_thread_helper+0x4/0x10
[26285.658649] [<ffffffff81075f90>] ? kthread_freezable_should_stop+0x70/0x70
[26285.658652] [<ffffffff8168a3e0>] ? gs_change+0x13/0x13
[26285.658654] ---[ end trace 59c6162fdfcbffee ]---
[26756.021167] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26756.021426] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26766.014093] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26766.014397] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26932.376233] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[26932.376544] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[26932.384285] ------------[ cut here ]------------
[26932.384354] WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[26932.384356] Hardware name: SATELLITE Z830
[26932.384358] Modules linked in: sdhci_pci sdhci btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs reiserfs ext2 snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm arc4 ghash_clmulni_intel aesni_intel cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep uvcvideo snd_pcm videobuf2_core microcode videodev bnep iwlwifi videobuf2_vmalloc snd_seq_midi psmouse videobuf2_memops snd_rawmidi rfcomm pcspkr snd_seq_midi_event serio_raw snd_seq bluetooth mac80211 snd_timer snd_seq_device i915 drm_kms_helper cfg80211 drm toshiba_acpi snd sparse_keymap soundcore wmi i2c_algo_bit toshiba_bluetooth snd_page_alloc parport_pc mei video mac_hid lpc_ich ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc lp parport e1000e ahci libahci [last unloaded: sdhci]
[26932.384421] Pid: 24262, comm: kworker/u:2 Tainted: G W 3.5.0-26-generic #40-Ubuntu
[26932.384422] Call Trace:
[26932.384431] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[26932.384436] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[26932.384451] [<ffffffffa02d32e6>] gen6_enable_rps+0x706/0x710 [i915]
[26932.384466] [<ffffffffa02bf3f6>] intel_modeset_init_hw+0x66/0xa0 [i915]
[26932.384476] [<ffffffffa02954b4>] i915_reset+0x1a4/0x6e0 [i915]
[26932.384482] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[26932.384493] [<ffffffffa029a943>] i915_error_work_func+0xc3/0x110 [i915]
[26932.384500] [<ffffffff8107097a>] process_one_work+0x12a/0x420
[26932.384511] [<ffffffffa029a880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[26932.384514] [<ffffffff8107152e>] worker_thread+0x12e/0x2f0
[26932.384517] [<ffffffff81071400>] ? manage_workers.isra.26+0x200/0x200
[26932.384521] [<ffffffff81076023>] kthread+0x93/0xa0
[26932.384526] [<ffffffff8168a3e4>] kernel_thread_helper+0x4/0x10
[26932.384531] [<ffffffff81075f90>] ? kthread_freezable_should_stop+0x70/0x70
[26932.384534] [<ffffffff8168a3e0>] ? gs_change+0x13/0x13
[26932.384536] ---[ end trace 59c6162fdfcbffef ]---

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-26-generic 3.5.0-26.40
ProcVersionSignature: Ubuntu 3.5.0-26.40-generic 3.5.7.6
Uname: Linux 3.5.0-26-generic x86_64
ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: luca 2084 F.... pulseaudio
CheckboxSubmission: f8b82cd9bc23fe075e5068a9824afda5
CheckboxSystem: b1865df84255b8716d3bcc269ff410d1
Date: Sat Mar 2 22:25:14 2013
HibernationDevice: RESUME=UUID=20fe6da8-7d68-4660-953f-6e4ae1d348a7
InstallationDate: Installed on 2012-04-26 (310 days ago)
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
MachineType: TOSHIBA SATELLITE Z830
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-26-generic root=UUID=36929bf3-a158-44d9-a80d-3adac2840fa8 ro quiet splash acpi_backlight=vendor i915.i915_enable_rc6=1 i915.lvds_downclock=1 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-26-generic N/A
 linux-backports-modules-3.5.0-26-generic N/A
 linux-firmware 1.95
SourcePackage: linux
UpgradeStatus: Upgraded to quantal on 2012-10-28 (125 days ago)
dmi.bios.date: 07/31/2012
dmi.bios.vendor: TOSHIBA
dmi.bios.version: Version 1.70
dmi.board.asset.tag: 0000000000
dmi.board.name: Portable PC
dmi.board.vendor: TOSHIBA
dmi.board.version: Version A0
dmi.chassis.asset.tag: 0000000000
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: Version 1.0
dmi.modalias: dmi:bvnTOSHIBA:bvrVersion1.70:bd07/31/2012:svnTOSHIBA:pnSATELLITEZ830:pvrPT22LE-00300GGR:rvnTOSHIBA:rnPortablePC:rvrVersionA0:cvnTOSHIBA:ct10:cvrVersion1.0:
dmi.product.name: SATELLITE Z830
dmi.product.version: PT22LE-00300GGR
dmi.sys.vendor: TOSHIBA

Revision history for this message
luca (llucax) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
luca (llucax) wrote : Re: [regression] 3.5.0-26-generic CPU hangs

Another note, when these hungs happen, I get graphic corruption (usually in the fonts/text).

Revision history for this message
luca (llucax) wrote :

kernel 3.5.0-25-generic also seems to work fine.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: regression-update
Revision history for this message
luca (llucax) wrote :

After I resumed from suspension with kernel 3.5.0-25-generic I got again the annoying dialogs saying there was a GPU hung detected asking me to report a bug that I have no idea where is going, but looking at dmesg I can't see anything strange [1]. How can I see why those dialogs are being open to see if there is something wrong?

I had the annoying dialog several times in a very short period of time, like 10 times in about 5 minutes and then it stopped. After that I suspended and resumed my laptop a couple of times and it didn't happen again so far.

[1] Except for messages like this but I'm getting this since I bought this computer about an year ago and never had those annoying dialog about any GPU hang:
[52682.020386] CPU1: Package power limit notification (total events = 5770)
[52682.020389] CPU3: Package power limit notification (total events = 5769)
[52682.020391] CPU2: Package power limit notification (total events = 5761)
[52682.020393] CPU0: Package power limit notification (total events = 5746)
[52682.021517] CPU3: Package power limit normal
[52682.021520] CPU1: Package power limit normal
[52682.021521] CPU2: Package power limit normal
[52682.021526] CPU0: Package power limit normal

Revision history for this message
luca (llucax) wrote :

Also, I couldn't see any graphic corruption this last time with kernel 3.5.0-25

Revision history for this message
Craig McQueen (cmcqueen1975) wrote :

This affects me, but in my case I'm running Ubuntu 12.04, and the problem seems to be with kernel 3.2.0-39. Booting to kernel 3.2.0-38 seems to have fixed it.

1 comments hidden view all 557 comments
Revision history for this message
Hans (old-man999) wrote :
Revision history for this message
luca (llucax) wrote :

Seemsto befixed in linux-image-3.5.0-26-generic 3.5.0-26.42

Revision history for this message
luca (llucax) wrote :

Nope, stil getting it with linux-image-3.5.0-26-generic 3.5.0-26.42

[32861.907463] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[32861.907470] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[32861.911988] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
...
[39199.903510] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[39199.903846] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

The same here, previous kernel 3.5.0-25-generic works without problems, 3.5.0-26.42 hanged just now:

$ dmesg|grep i915
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 1.667363] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1.667367] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1.667652] i915 0000:00:02.0: setting latency timer to 64
[ 1.687950] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[ 2.429882] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 330.684154] i915 0000:00:02.0: power state changed by ACPI to D3
[ 331.826825] i915 0000:00:02.0: power state changed by ACPI to D0
[ 331.826829] i915 0000:00:02.0: power state changed by ACPI to D0
[ 331.826830] i915 0000:00:02.0: setting latency timer to 64
[ 1677.075872] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1677.075876] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Content of i915_error_state attached.

Will disable rc6 and test what happens then.

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

With rc6 off the hangup happened 2 minutes after booting:

$ dmesg|grep i915
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-26-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=0 vt.handoff=7
[ 0.857239] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.857242] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.857458] i915 0000:00:02.0: setting latency timer to 64
[ 0.877771] i915 0000:00:02.0: irq 44 for MSI/MSI-X
[ 1.619983] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 128.787009] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 128.787013] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 254.699283] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Seems it's time to return to 3.5.0-25-generic.

Revision history for this message
Laurent (l-perlat) wrote :

Same problem here :

Visual corruptions + "GPU hang" error when scrolling in Firefox with 3.5.0-26.

Everything back to normal on 3.5.0-25 (Linux 3.5.0-25-generic #39-Ubuntu SMP Mon Feb 25 18:26:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

Revision history for this message
tobyS (tobias-schlitt) wrote :

Beside visual corruptions and hangs I also experience complete system hang ups (no reaction until hard reboot) and occasional kernel panics. I therefore wonder why this report does not receive higher prio?

czigor (czigor)
summary: - [regression] 3.5.0-26-generic CPU hangs
+ [regression] 3.5.0-26-generic GPU hangs
Revision history for this message
shuerhaaken (shkn) wrote : Re: [regression] 3.5.0-26-generic GPU hangs

Same issue here. Happens very often during firefox usage, but also on other ocasions.

[51278.392895] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[51278.392901] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[51278.397785] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
shuerhaaken (shkn) wrote :

This is really getting annoying, is anybody taking care of this?

Revision history for this message
czigor (czigor) wrote :

@shkn:
Using 3.5.0-25-generic made my PC usable again. I get an error message only at login.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

it's one of these commits (from the quantal kernel), likely the top one since it's happening on sandybridge:

817e8fdee14b05d drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
4c443ec9afe7f6f drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
f534135423c7028 drm/i915: Disable AsyncFlip performance optimisations
c0c1fd8a18479f0 drm/i915: Invalidate the relocation presumed_offsets along the slow path

Changed in linux (Ubuntu):
assignee: nobody → Ubuntu Kernel Team (ubuntu-kernel-team)
importance: Medium → Critical
Changed in linux (Ubuntu Quantal):
importance: Undecided → Critical
status: New → Confirmed
Changed in linux (Ubuntu Precise):
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

note that I'm not sure it's affecting raring, maybe not.

Adam Conrad (adconrad)
Changed in linux (Ubuntu Precise):
status: Confirmed → Invalid
Changed in linux-lts-quantal (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-quantal (Ubuntu Quantal):
status: New → Invalid
Changed in linux-lts-quantal (Ubuntu Raring):
status: New → Invalid
Changed in linux-lts-quantal (Ubuntu Precise):
importance: Undecided → Critical
Changed in linux (Ubuntu Precise):
importance: Critical → Undecided
tags: added: performing-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a kernel bisect to identify the exact commit that introduced this regression. However, it would be good to test the latest mainline and a test kernel with commit 817e8fdee14b05d reverted.

The latest mainline kernel can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc3-raring/

Can folks affected by this bug test the v3.9-rc3 kernel?

One thing to note, you will need to install both the linux-image and linux-image-extra .deb packages.

I will also build a Quantal test kernel with commit 817e8fdee14b05d reverted and post a link shortly.

Thanks in advance!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Quantal test kernel with commit 817e8fdee14b05d reverted. The kernel can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1140716/

Can folks affected by this bug test this kernel and report back if it fixes the issue?

Revision history for this message
Gard Spreemann (gspreemann) wrote :

@jsalisbury: I could not successfully test the kernel you linked to in comment #22, as it rendered my system unusable. X started at 640x480, there was no working keyboard/mouse, and I could not SSH in.

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

@jsalisbury: Thanks for pointing to this kernel version. I have been able to successfully test kernel v3.9-rc3 on Precise 12.04.2 (I am running with quantal-lts xorg stack).
I have been running on it for ~ 3 hours so far with success. It usually took some time for me to hit this bug on my Dell V131, so some more testing might be required.
I will report back if I hit the bug again. So far so good.

Revision history for this message
luca (llucax) wrote :

Also initial success for now. Still getting the annoying dialog at startup though (but no signs of GPU hungs in dmesg).

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

@jsakusbury: I tested your kernel 3.5.0-27-generic #45~lp1140716v1 (from comment #22), it was no improvement for my system. I got two hangups within the first hour (one S3 cycle at 1985), the second one forced me to turn off the system:

[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.5.0-27-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.5.0-27-generic root=/dev/mapper/System-root ro quiet splash i915.i915_enable_rc6=1 vt.handoff=7
[ 0.804805] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.804809] i915 0000:00:02.0: power state changed by ACPI to D0
[ 0.805030] i915 0000:00:02.0: setting latency timer to 64
[ 0.824988] i915 0000:00:02.0: irq 43 for MSI/MSI-X
[ 1.563280] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[ 1894.853449] i915 0000:00:02.0: power state changed by ACPI to D3
[ 1896.202702] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1896.202708] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1896.202720] i915 0000:00:02.0: setting latency timer to 64
[ 1984.429241] i915 0000:00:02.0: power state changed by ACPI to D3
[ 1985.767157] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1985.767160] i915 0000:00:02.0: power state changed by ACPI to D0
[ 1985.767168] i915 0000:00:02.0: setting latency timer to 64
[ 2132.278551] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 2132.278555] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 3504.895781] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Revision history for this message
luca (llucax) wrote :

torsten, maybe you are having a different issue, note that your hang doesn't look like related to rc6 state.

 [51278.397785] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

BTW, my system is still surviving without hangs with the patched 3.5 kernel.

Revision history for this message
luca (llucax) wrote :

I just had a burst of dialogs informing of non-existent GPU hangs (with kernel 3.5 patched). The GPU hans are not reported in dmesg though, so I don't know where is it getting from. Also no corruption or anything. Seems like the dialog madness is started when an unrelated program crashes. Maybe is just an apport bug? How should I proceed to see what's really going on?

Revision history for this message
Alexis Lauthier (alx7539-launchpad) wrote :

@jsalisbury: I've been running your 3.5.0-27-generic #45~lp1140716v1 for 5 hours and I've already had 3 hangs. No improvement here.

[ 5733.121323] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5733.121330] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5733.124957] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
luca (llucax) wrote :

OK, it took a while but I got the GPU hang finally with kernel3.5.0-27-generic #45~lp1140716v1 :

[22344.085044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22344.085051] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[22344.090106] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

Revision history for this message
luca (llucax) wrote :

Always happens with firefox, an only with certain sites (consistently).

Revision history for this message
luca (llucax) wrote :

I got this with a second hang:

[22344.085044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[22344.085051] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[22344.090106] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[23652.138382] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[23652.138898] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[23652.146420] ------------[ cut here ]------------
[23652.146491] WARNING: at /home/jsalisbury/bugs/lp1140716/ubuntu-quantal/drivers/gpu/drm/i915/intel_pm.c:2505 gen6_enable_rps+0x706/0x710 [i915]()
[23652.146495] Hardware name: SATELLITE Z830
[23652.146497] Modules linked in: sdhci_pci sdhci snd_hda_codec_hdmi snd_hda_codec_realtek joydev btusb coretemp kvm_intel kvm ghash_clmulni_intel aesni_intel arc4 cryptd aes_x86_64 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm uvcvideo videobuf2_core videodev snd_seq_midi videobuf2_vmalloc videobuf2_memops snd_rawmidi microcode snd_seq_midi_event iwlwifi snd_seq snd_timer snd_seq_device i915 bnep rfcomm mac80211 toshiba_acpi sparse_keymap drm_kms_helper wmi toshiba_bluetooth snd pcspkr bluetooth drm i2c_algo_bit cfg80211 soundcore psmouse mac_hid snd_page_alloc video serio_raw mei lpc_ich parport_pc ppdev nfsd nfs lockd fscache auth_rpcgss nfs_acl lp sunrpc parport ahci libahci e1000e [last unloaded: sdhci]
[23652.146578] Pid: 3451, comm: kworker/u:0 Not tainted 3.5.0-27-generic #45~lp1140716v1
[23652.146581] Call Trace:
[23652.146592] [<ffffffff81051bef>] warn_slowpath_common+0x7f/0xc0
[23652.146599] [<ffffffff81051c4a>] warn_slowpath_null+0x1a/0x20
[23652.146621] [<ffffffffa03f6316>] gen6_enable_rps+0x706/0x710 [i915]
[23652.146640] [<ffffffffa03e2446>] intel_modeset_init_hw+0x66/0xa0 [i915]
[23652.146655] [<ffffffffa03b84b4>] i915_reset+0x1a4/0x6e0 [i915]
[23652.146663] [<ffffffff8101257b>] ? __switch_to+0x12b/0x420
[23652.146679] [<ffffffffa03bd943>] i915_error_work_func+0xc3/0x110 [i915]
[23652.146688] [<ffffffff8107098a>] process_one_work+0x12a/0x420
[23652.146701] [<ffffffffa03bd880>] ? gen6_pm_rps_work+0xe0/0xe0 [i915]
[23652.146707] [<ffffffff8107153e>] worker_thread+0x12e/0x2f0
[23652.146712] [<ffffffff81071410>] ? manage_workers.isra.26+0x200/0x200
[23652.146719] [<ffffffff81076033>] kthread+0x93/0xa0
[23652.146726] [<ffffffff8168ab24>] kernel_thread_helper+0x4/0x10
[23652.146732] [<ffffffff81075fa0>] ? kthread_freezable_should_stop+0x70/0x70
[23652.146737] [<ffffffff8168ab20>] ? gs_change+0x13/0x13
[23652.146740] ---[ end trace 2153106cc632835c ]---

Revision history for this message
Gard Spreemann (gspreemann) wrote :

I'm confused as to where the commits referenced by tjaalton in comment #19 live, but for what it's worth, I seem to have a stable system after applying reverse diffs of the following commits from the linux-3.5.y branch of git://kernel.ubuntu.com/ubuntu/linux.git to the 3.5.0-27.45 sources:

2964148 - drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled
899b550 - drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits

Just reverting the first, or using jsalisbury's kernel from comment #22 (ignore my comment #23, I was being an idiot and forgot the modules) gives me a GPU hang and/or graphics corruption within minutes, especially quickly if opening Firefox. After reverting both of the above, I haven't been able to hang the system yet.

Revision history for this message
Torsten Hilbrich (torsten-hilbrich) wrote :

Kernel 3.9.0-030900rc3-generic from comment #21 is much more stable for me, no problems so far after 4h of operation.

Revision history for this message
franglais.125 (franglais.125-deactivatedaccount) wrote :

@jsalisbury: After a few days of use and many suspend-resume cycles, I am yet to encounter a problem with kernel 3.9-rc3 (as indicated in comment #21). No problems whatsoever on my Dell v131 (i5 Sandybridge)...

Revision history for this message
Peter Saunderson (peteasa) wrote :

I got this a lot with Kernel: 3.5.0-26-generic and used a quick workround to avoid the problem: http://askubuntu.com/questions/225356/how-can-i-enable-the-sna-acceleration-method-for-intel-cards-under-ubuntu-12-04

SNA does not seem to have the same issue just UXA. If I have time I can try a new kernel but I spent so much time on this already it may be a few days before I get the time to try the new kernel.

Revision history for this message
Max Rameau (afrimax-e) wrote :

I had the problem right after updating (not upgrading) on Saturday using 12.04.

I was able to control it by logging into 2D and immediately opening the System Monitor and shutting down the three instances of Ubuntu One (login, synch and launch), because the machine would freeze upon login to Ubuntu One. I then had to shut down zeitgeist-fts, because that would start eating up resourced (upto 300mb of memory at one point).

At that point, I just decided to reinstall into 12.10. I did that and it worked fine for an hour, so I started transfering over my backed up files and logged into Ubuntu One while running the updates. The problems began immediately, including resource use going up to 100% for long periods of time, mainly through the multiplication of the gkts (?) service. It only used 3.8MB at a time, but at one point there were 20 instances of it open. I concluded it was Ubuntu One causing the problem, so I reinstalled again, this time not logging into Ubuntu One. No problems for 4 hours, even as I installed software. Then I ran the automatic software update, and the problems began again immediately.

Constant crashing, crazy graphic corruption and other issues. Ran the system log and got the similar error:

kernel [224.243459] [drm: Enable RC6 States: RC6 off, RC6p off, RC6p off]
kernal [246.465377] [drm: i95_hangcheck_hung] *ERROR* Hangcheck timer elapsed GPU hung

etc., etc.

This is a nightmare. Need a fix.

Revision history for this message
Matthew Eaton (meaton) wrote :

Test kernel did not fix the issue for me.

Linux matt-work 3.5.0-27-generic #45~lp1140716v1 SMP Fri Mar 22 15:50:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Mar 25 08:12:15 matt-work kernel: [ 158.302349] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 25 08:12:15 matt-work kernel: [ 158.302353] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Mar 25 08:12:15 matt-work kernel: [ 158.305230] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
Mar 25 08:12:36 matt-work kernel: [ 179.663557] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 25 08:12:36 matt-work kernel: [ 179.663780] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks, everyone for testing. So it sounds like my test kernel did not fix this bug. However, it sounds like this bug is fixed in the v3.9 mainline kernel, at least in rc3.

I can perform a "Reverse" kernel bisect to identify the commit that fixes this bug. It will first require us to identify the first v3.9 release candidate that does not exhibit this bug.

We know that it is fixed in rc3, so it would be good to test rc1 and rc2. Can folks affected by this bug test those two release candidates:

v3.9-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc1-raring/
v3.9-rc2: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc2-raring/

Revision history for this message
Matthew Eaton (meaton) wrote :

I've been on the rc1 kernel for about 3 hours with no problem.

Linux matt-work 3.9.0-030900rc1-generic #201303060659 SMP Wed Mar 6 12:00:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tags: added: kernel-key
Robert Hooker (sarvatt)
Changed in linux (Ubuntu Precise):
status: Invalid → Confirmed
importance: Undecided → Critical
Changed in linux (Ubuntu Raring):
assignee: Ubuntu Kernel Team (ubuntu-kernel-team) → nobody
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Robert Hooker (sarvatt)
summary: - [regression] 3.5.0-26-generic GPU hangs
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
Robert Hooker (sarvatt)
Changed in linux (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux (Ubuntu Quantal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Precise):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Raring):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Kamil (lampshade-t)
tags: added: apport-collected
Aymeric (mulx)
tags: added: precise
tags: removed: kernel-key
tags: removed: performing-bisect
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Quantal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Precise):
status: Confirmed → Fix Committed
Steve Conklin (sconklin)
tags: added: verification-needed-precise
tags: added: verification-needed-quantal
tags: added: verification-done-precise
removed: verification-needed-precise
tags: added: verification-done-quantal
removed: verification-needed-quantal
tags: added: verification-failed-precise
removed: verification-done-precise
tags: added: verification-done-precise
removed: verification-failed-precise
Sergio (sergio-otero)
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
Changed in linux (Ubuntu Precise):
status: Fix Released → Fix Committed
Changed in linux (Ubuntu Raring):
status: Invalid → New
Changed in linux (Ubuntu Raring):
status: New → Confirmed
tags: added: kernel-da-key
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in linux-lts-quantal (Ubuntu Precise):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Raring):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: kernel-stable-key
Changed in linux (Debian):
status: Unknown → New
Changed in linux:
importance: Unknown → Medium
status: Unknown → Invalid
Changed in dri:
importance: Unknown → Medium
status: Unknown → Confirmed
no longer affects: linux-lts-raring (Ubuntu Quantal)
no longer affects: linux-lts-raring (Ubuntu Raring)
tags: added: patch
Changed in linux-lts-raring (Ubuntu Precise):
status: New → Confirmed
Changed in linux-lts-raring (Ubuntu):
status: New → Confirmed
gokul (gokulnathonline)
information type: Public → Public Security
information type: Public Security → Public
theghost (theghost)
tags: added: saucy
Changed in linux-lts-quantal (Ubuntu):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Quantal):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Raring):
importance: Undecided → Critical
Changed in linux-lts-raring (Ubuntu):
importance: Undecided → Critical
Changed in linux-lts-raring (Ubuntu Precise):
importance: Undecided → Critical
Changed in linux-lts-quantal (Ubuntu Precise):
status: Fix Released → Invalid
Changed in linux (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux (Ubuntu Quantal):
status: Fix Released → Invalid
Changed in linux (Ubuntu Precise):
status: Fix Released → Invalid
Changed in linux-lts-raring (Ubuntu):
status: Confirmed → Triaged
Changed in linux-lts-raring (Ubuntu):
status: Triaged → Invalid
Changed in linux-lts-raring (Ubuntu Precise):
status: Confirmed → Invalid
Changed in mesa (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Changed in linux (Ubuntu Precise):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in linux (Ubuntu Quantal):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in linux (Ubuntu Raring):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Changed in mesa (Ubuntu Precise):
status: New → Triaged
importance: Undecided → Critical
Changed in dri:
status: Confirmed → In Progress
Mathew Hodson (mhodson)
Changed in linux:
importance: Medium → Unknown
status: Invalid → Unknown
affects: linux → mesa
Mathew Hodson (mhodson)
tags: removed: saucy
Mathew Hodson (mhodson)
tags: added: metabug
Andy Whitcroft (apw)
Changed in linux-lts-quantal (Ubuntu Precise):
status: Invalid → Fix Committed
Changed in linux (Ubuntu Precise):
status: Invalid → Fix Committed
477 comments hidden view all 557 comments
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 91832 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

(In reply to Chris Wilson from comment #192)
> (In reply to comment #191)
> > What information is most useful for these repeating issues, as it just
> > happened again:
> >
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139690] [drm] stuck on
> > render ring
> > Sep 16 08:32:59 arrowsmithlap1 kernel: [1182242.139699] [drm] stuck on
> > blitter ring
>
> So long as it is the same event, there is no more information we need other
> than testing feedback for an eventual workaround.

Is this the same bug?

$ journalctl -p 3 -b -1
Ruj 25 02:13:01 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:01 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
... [ repeated messages ] ...
Ruj 25 02:13:33 crnigrom kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Ruj 25 02:13:33 crnigrom kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.16 [i915]] *ERROR* GT thread status wait timed out
Ruj 25 02:13:34 crnigrom kernel: [drm:stop_ring [i915]] *ERROR* render ring : timed out trying to stop ring
Ruj 25 02:13:34 crnigrom kernel: [drm:init_ring_common [i915]] *ERROR* render ring initialization failed ctl 00000000 (valid? 0) head 00000000 tail 00000000 start 00000000 [expected 00000000]
Ruj 25 02:13:34 crnigrom kernel: [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5
Ruj 25 02:13:34 crnigrom gnome-session[1823]: Unrecoverable failure in required component gnome-shell.desktop

After which gnome crashes with "Oh No Something Is Wrong" screen

$ uname -r
4.1.7-200.fc22.x86_64

Hardware i3-2100 CPU/GPU

This bug is going on already for a long long time, but at least computer is not hard freezing anymore, although gnome is crashing so any gtk applications running doing something stalls.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92118 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92739 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was resolved by uninstalling various components, re-installing and updating them. I have a hunch (completely unproven) that it was a transparent bit-fail issue from the SSD. By un-installing and re-installing, the files were likely installed to a different location on the drive. It wasn't configuration, as I tried erasing, and even rolling back to defaults, with the problem still persisting. As it was almost daily, prior to uninstall, and hasn't happened since the install, this is all I can attribute it to.

HTH someone.

Revision history for this message
In , Jefbed (jefbed) wrote :

Created attachment 119432
attachment-28908-0.html

I reported this bug from a system without an SSD. Recently, I have not
seen the kernel messages appear however--currently on linux 4.2.5.

On Sun, Nov 1, 2015 at 10:04 PM, <email address hidden> wrote:

> *Comment # 235 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c235>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> FWIW, my issue (https://bugs.freedesktop.org/show_bug.cgi?id=54226#c191), was
> resolved by uninstalling various components, re-installing and updating them. I
> have a hunch (completely unproven) that it was a transparent bit-fail issue
> from the SSD. By un-installing and re-installing, the files were likely
> installed to a different location on the drive. It wasn't configuration, as I
> tried erasing, and even rolling back to defaults, with the problem still
> persisting. As it was almost daily, prior to uninstall, and hasn't happened
> since the install, this is all I can attribute it to.
>
> HTH someone.
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Arrowsmith (arrowsmith) wrote :

(In reply to Jeffrey E. Bedard from comment #236)
> Created attachment 119432 [details]
> attachment-28908-0.html
>
> I reported this bug from a system without an SSD. Recently, I have not
> seen the kernel messages appear however--currently on linux 4.2.5.

Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an SSD. There was lots of clicking. Upgraded packages as they came in, but no change. Only the uninstall and re-install cleared the repeat button. :)

Revision history for this message
In , Jefbed (jefbed) wrote :

Created attachment 119433
attachment-32271-0.html

I think this bug can be marked as closed with the latest linux/mesa/xorg
versions :)

On Fri, Nov 6, 2015 at 1:47 AM, <email address hidden> wrote:

> *Comment # 237 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c237>
> on bug 54226 <https://bugs.freedesktop.org/show_bug.cgi?id=54226> from
> <email address hidden> <email address hidden> *
>
> (In reply to Jeffrey E. Bedard from comment #236 <https://bugs.freedesktop.org/show_bug.cgi?id=54226#c236>)> Created attachment 119432 <https://bugs.freedesktop.org/attachment.cgi?id=119432> [details] <https://bugs.freedesktop.org/attachment.cgi?id=119432&action=edit>
> > attachment-28908-0.html
> >
> > I reported this bug from a system without an SSD. Recently, I have not
> > seen the kernel messages appear however--currently on linux 4.2.5.
>
> Ah, let me clarify that earlier comment: I dd'd a failing spinning drive to an
> SSD. There was lots of clicking. Upgraded packages as they came in, but no
> change. Only the uninstall and re-install cleared the repeat button. :)
>
> ------------------------------
> You are receiving this mail because:
>
> - You are on the CC list for the bug.
>
>

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 92927 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kurt Roeckx (kurt-roeckx) wrote :

Created attachment 120189
error state with 4.2 kernel

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93331 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93482 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93493 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 89524 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93595 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93876 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 93824 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 94057 has been marked as a duplicate of this bug. ***

Revision history for this message
In , sander eikelenboom (b-linux) wrote :

Tuesday, March 1, 2016, 9:43:23 PM, you wrote:

> Chris Wilson changed bug 54226
> WhatRemovedAddedCC <email address hidden>
>

> Comment # 249 on bug 54226 from Chris Wilson
> *** Bug 94057 has been marked as a duplicate of this bug. ***
>

> You are receiving this mail because:
> You are on the CC list for the bug.
>

Sorry to say, but:
Is there a way to get off the CC-list of this slightly depressing kind of "catch-all" bug ?
It unfortunately doesn't seem to have be going anywhere for the last 3 to 4 years accept
for an endless stream of duplicates being appended.

--
Sander

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

(In reply to Sander Eikelenboom from comment #250)
> Is there a way to get off the CC-list of this slightly depressing kind of
> "catch-all" bug ?

CC list is at the top right corner. Choose the address, tick "Remove selected CCs", and hit Save Changes.

I've done this for you now.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 95238 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Samantham (samantham) wrote :

Chris, I seem to be experiencing this bug in Linux 4.7rc3 on an x220 ThinkPad with Intel HD 3000 chipset. I was getting random full system freeze, non responsive over network.

The main messages before the crash were:
Jun 23 19:11:18 athena kernel: [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
Jun 23 19:11:18 athena kernel: [drm:__gen6_gt_wait_for_thread_c0.isra.7 [i915]] *ERROR* GT thread status wait timed out.

The original crash I haven't been able to reproduce easily but I CAN reproduce every time a full system lockup running the following intel-gpu-tools tests (I have not even close to run all the tests though) [**This may or may not be related to the original crash**]

gem_sync, subtest: bsd2-hang
drv_hangman, subtest: error-state-capture-bit

I do not know if these tests are helpful or related (maybe some are known to fail? not sure).
I have drm debugging turned on for when I ran those tests. (drm.debug=0x1e log_buf_len=1M)
I can post logs of the hangs associated with the two tests/subtests and run any other tests if you desire (with kernel drm debug on), I will wait for the issue to reappear with the drm debug on before posting that log though. By the number of similar bugs you may already have the CALL TRACE and non-debug level logs.

I know how to patch and am able to compile kernels to test. The bug effects me maybe once every 1 or 2 days. I use XOrg with Glamor. I have been seeing these crashes since 4.6 (maybe 4.5 or earlier not sure).

I know how to apply patches and am able to compile drm-next or any patches you have to see if this issue can be isolated. Thanks, sorry for the long response.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 97304 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 97451 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Yann-argotti (yann-argotti) wrote :

*** Bug 98294 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 98807 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 100245 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ricardo-vega-u (ricardo-vega-u) wrote :

Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included

Revision history for this message
In , Samuel Rakitničan (semirocket) wrote :

I doesn't seem to be getting mentioned Gnome crashes on my sandybridge anymore with mainline kernels, that is currently 4.11 and I think even with 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default centos 7 kernels I am definitely getting very frequent GPU crashes that brings down Gnome.

So it is either fixed for good, or it become much rarer. The issue I am/was experiencing happens when Gnome is running, it does not happen when only GDM is loaded. System load seems to not have effect on the bug triggering, seems to happen any time, on idle, or when machine is loaded.

Revision history for this message
In , Elizabethx-de-la-torre-mena (elizabethx-de-la-torre-mena) wrote :

(In reply to samuel.rakitnican from comment #260)
> I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> anymore with mainline kernels, that is currently 4.11 and I think even with
> 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> centos 7 kernels I am definitely getting very frequent GPU crashes that
> brings down Gnome.
>
> So it is either fixed for good, or it become much rarer. The issue I am/was
> experiencing happens when Gnome is running, it does not happen when only GDM
> is loaded. System load seems to not have effect on the bug triggering, seems
> to happen any time, on idle, or when machine is loaded.
Hopefully, is fixed for good. I'm closing this bug, if problem arise with latest kernel versions https://www.kernel.org/ please open a NEW bug with HW and SW information, steps to reproduce and relevant logs.Thank you.

Changed in dri:
status: In Progress → Fix Released
Changed in linux (Fedora):
importance: Unknown → Undecided
status: Unknown → Won't Fix
Revision history for this message
In , Chris Wilson (ickle) wrote :

(In reply to Elizabeth from comment #261)
> (In reply to samuel.rakitnican from comment #260)
> > I doesn't seem to be getting mentioned Gnome crashes on my sandybridge
> > anymore with mainline kernels, that is currently 4.11 and I think even with
> > 4.10 I was not getting any issues, with mainline longterm 4.4.61 and default
> > centos 7 kernels I am definitely getting very frequent GPU crashes that
> > brings down Gnome.
> >
> > So it is either fixed for good, or it become much rarer. The issue I am/was
> > experiencing happens when Gnome is running, it does not happen when only GDM
> > is loaded. System load seems to not have effect on the bug triggering, seems
> > to happen any time, on idle, or when machine is loaded.
> Hopefully, is fixed for good. I'm closing this bug, if problem arise with
> latest kernel versions https://www.kernel.org/ please open a NEW bug with HW
> and SW information, steps to reproduce and relevant logs.Thank you.

There was no fix for this HW issue.

Revision history for this message
In , Aaron-lu-a (aaron-lu-a) wrote :

Created attachment 135173
gpu error file on 4.13.5-200.fc26.x86_64

This problem reappeared on 4.13.5-200.fc26.x86_64 last Friday.

[774249.632109] [drm] GPU HANG: ecode 6:0:0x85fffff8, in Xorg [696], reason: Hang on rcs0, action: reset
[774249.632110] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[774249.632111] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[774249.632111] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[774249.632111] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[774249.632112] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[774249.632172] drm/i915: Resetting chip after gpu hang

Changed in dri:
status: Fix Released → Confirmed
Revision history for this message
In , Chris Wilson (ickle) wrote :

commit 0da715ee60774401bea00dc71fca6fd1096c734a
Author: Chris Wilson <email address hidden>
Date: Mon Nov 20 20:55:02 2017 +0000

    drm/i915: Disable semaphores on Sandybridge

Changed in dri:
status: Confirmed → Won't Fix
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104243 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104304 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 104772 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jani-saarinen-g (jani-saarinen-g) wrote :

I will close this now.

Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 106119 has been marked as a duplicate of this bug. ***

summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
- Sandybridge
+ Order Xanax Online Overnight
description: updated
description: updated
Steve Langasek (vorlon)
summary: - Order Xanax Online Overnight
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
description: updated
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
- Sandybridge
+ Best place to order Tramadol Online in Religh NC
description: updated
description: updated
summary: - Best place to order Tramadol Online in Religh NC
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
smithava (smithava23)
summary: - [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
- Sandybridge
+ Buy Adipex Online To Suppress Appetite
description: updated
summary: - Buy Adipex Online To Suppress Appetite
+ [regression] 3.5.0-26-generic and 3.2.0-39-generic GPU hangs on
+ Sandybridge
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

closing mesa as fixed according to upstream years ago

Changed in mesa (Ubuntu):
status: Triaged → Fix Released
Changed in mesa (Ubuntu Precise):
status: Triaged → Fix Released
Changed in linux-lts-quantal (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
Changed in linux (Debian):
status: New → Fix Released
Displaying first 40 and last 40 comments. View all 557 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.