system hang: i915 Resetting rcs0 for hang on rcs0

Bug #1861395 reported by David Britton on 2020-01-30
116
This bug affects 25 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Status tracked in Focal
Eoan
Undecided
Unassigned
Focal
Undecided
Unassigned

Bug Description

System hangs, unknown cause, When this happens, the mouse pointer still moves, but I can't do anything else with the keys or clicking in the UI. Only recover I have found is a hard power-off

Last bit of kern.log below:

Jan 30 12:43:51 aries kernel: [ 6649.263031] i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Jan 30 12:43:51 aries kernel: [ 6649.263032] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Jan 30 12:43:51 aries kernel: [ 6649.263033] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Jan 30 12:43:51 aries kernel: [ 6649.263033] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Jan 30 12:43:51 aries kernel: [ 6649.263034] The GPU crash dump is required to analyze GPU hangs, so please always attach it.
Jan 30 12:43:51 aries kernel: [ 6649.263034] GPU crash dump saved to /sys/class/drm/card0/error
Jan 30 12:43:51 aries kernel: [ 6649.264039] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:43:51 aries kernel: [ 6649.264778] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jan 30 12:43:51 aries kernel: [ 6649.265046] i915 0000:00:02.0: Resetting chip for hang on rcs0
Jan 30 12:43:51 aries kernel: [ 6649.267018] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jan 30 12:43:51 aries kernel: [ 6649.267764] [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Jan 30 12:43:59 aries kernel: [ 6657.262680] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:01 aries kernel: [ 6659.246609] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:09 aries kernel: [ 6667.246324] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:09 aries kernel: [ 6667.494008] show_signal_msg: 20 callbacks suppressed
Jan 30 12:44:09 aries kernel: [ 6667.494011] GpuWatchdog[6827]: segfault at 0 ip 000055fd01917ded sp 00007f63043cc480 error 6 in chrome[55fcfd9dc000+7171000]
Jan 30 12:44:09 aries kernel: [ 6667.494017] Code: 48 c1 c9 03 48 81 f9 af 00 00 00 0f 87 c9 00 00 00 48 8d 15 a9 5a 9c fb f6 04 11 20 0f 84 b8 00 00 00 be 01 00 00 00 ff 50 30 <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 c1 6d a4 03 01 80 7d 8f 00
Jan 30 12:44:23 aries kernel: [ 6681.265885] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:25 aries kernel: [ 6683.245838] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:27 aries kernel: [ 6685.261749] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:29 aries kernel: [ 6687.245641] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:31 aries kernel: [ 6689.261618] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Jan 30 12:44:51 aries kernel: [ 6709.260901] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-12-generic 5.4.0-12.15
ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
Uname: Linux 5.4.0-12-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu15
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Thu Jan 30 12:51:24 2020
InstallationDate: Installed on 2018-06-18 (591 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-5.4
UpgradeStatus: Upgraded to focal on 2020-01-22 (8 days ago)
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu16
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: dpb 115653 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2018-06-18 (604 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 004: ID 138a:0097 Validity Sensors, Inc.
 Bus 001 Device 003: ID 04f2:b5ce Chicony Electronics Co., Ltd Integrated Camera
 Bus 001 Device 002: ID 8087:0a2b Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 20HRCTO1WW
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-14-generic root=UUID=fa64d67d-26bf-4c42-a12f-c45b6ea5117c ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.4.0-14.17-generic 5.4.18
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-14-generic N/A
 linux-backports-modules-5.4.0-14-generic N/A
 linux-firmware 1.186
Tags: focal
Uname: Linux 5.4.0-14-generic x86_64
UpgradeStatus: Upgraded to focal on 2020-01-22 (21 days ago)
UserGroups: adm cdrom dip libvirt lpadmin lxd netdev plugdev sambashare sudo video
_MarkForUpload: True
dmi.bios.date: 11/25/2019
dmi.bios.vendor: LENOVO
dmi.bios.version: N1MET59W (1.44 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20HRCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1MET59W(1.44):bd11/25/2019:svnLENOVO:pn20HRCTO1WW:pvrThinkPadX1Carbon5th:rvnLENOVO:rn20HRCTO1WW:rvrNotDefined:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad X1 Carbon 5th
dmi.product.name: 20HRCTO1WW
dmi.product.sku: LENOVO_MT_20HR_BU_Think_FM_ThinkPad X1 Carbon 5th
dmi.product.version: ThinkPad X1 Carbon 5th
dmi.sys.vendor: LENOVO

CVE References

David Britton (davidpbritton) wrote :
tags: added: champagne
Timo Aaltonen (tjaalton) wrote :

when it happens, grab the error dump from /sys/class/drm/card0/error (via ssh if not otherwise possible)

this probably needs to be filed upstream, as mentioned on the log

affects: linux-signed-5.4 (Ubuntu) → linux (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1861395

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Timo Aaltonen (tjaalton) wrote :

looks like chrome triggered the hang, so this is most likely the same as upstream https://gitlab.freedesktop.org/drm/intel/issues/673

a patch has been pending to get applied to v5.4.x upstream for a month now, but we can apply it on our kernel first.. I'll build a test kernel you can try

Timo Aaltonen (tjaalton) wrote :

5.4 needs this commit

https://<email address hidden>/

it's a slightly modified version of the one in 5.5

5.3 should _not_ be affected, unlike some are saying on the upstream bug

Changed in linux (Ubuntu Eoan):
status: New → Invalid
Timo Aaltonen (tjaalton) wrote :

please test the kernel at

https://aaltoset.kapsi.fi/5.4-hangfix

should be enough to install linux-image* and linux-modules* packages; first download them and then run 'sudo dpkg -i linux*.deb', and reboot once they've been installed.

Changed in linux:
status: Unknown → Fix Released
Chris Patterson (cjp256) wrote :

I built the kernel from master-next a week ago that included the fix commit. No problems since!

Thank you :)

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

David Britton (davidpbritton) wrote :

I think I hit the problem again, ran apport-collect on the kernel in proposed:

Linux aries 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:47:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

ii linux-generic 5.4.0.14.17 amd64 Complete Generic Linux kernel and headers

:( What else can I try? I'm kind of assuming after talking to the kernel team that this kernel from the proposed PPA has the fix, it might not. Would love to verify

Seth Forshee (sforshee) wrote :

That patch (drm/i915/gt: Detect if we miss WaIdleLiteRestore) first made it to our kernel in 5.4.0-13.16, so the current -proposed kernel does have the patch.

Oded Arbel (oded-geek) wrote :
Download full text (9.4 KiB)

I think I'm triggering this problem with focal 5.4.0-14-generic #17-Ubuntu.

Kernel log says:

----8<----
Feb 24 15:27:53 vesho kernel: Asynchronous wait on fence i915:Xorg[2401]:1d16c timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
Feb 24 15:27:53 vesho kernel: Asynchronous wait on fence i915:Xorg[2401]:1d170 timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
Feb 24 15:27:53 vesho kernel: Asynchronous wait on fence i915:Xorg[2401]:1d16e timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
Feb 24 15:27:57 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:05 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:07 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:09 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:11 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:13 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:15 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:17 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:19 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:21 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:23 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:25 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:27 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:29 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:31 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:33 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:35 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:37 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:39 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:41 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:42 vesho kernel: GpuWatchdog[9598]: segfault at 0 ip 0000560157086e32 sp 00007f17aaa944c0 error 6 in chrome[560153140000+7287000]
Feb 24 15:28:42 vesho kernel: Code: 83 c3 e8 75 e9 41 8b 85 00 01 00 00 85 c0 0f 84 99 00 00 00 48 8d 3d 63 61 4b fb be 01 00 00 00 ba 03 00 00 00 e8 fe 17 a6 fe <c7> 04 25 00>
Feb 24 15:28:43 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:45 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:47 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:49 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:51 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:53 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:55 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:57 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:28:59 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 24 15:2...

Read more...

Oded Arbel (oded-geek) wrote :

After updating to kernel mainline build 5.4.22-050422-generic #202002240833, I couldn't immediately trigger the problem (including scrubbing through a YouTube video on Chrome, as mentioned in upstream freedesktop ticket), even though previously I could usually get the problem triggered within a few minutes of logging in, without a YouTube video.

If this changes, I will update.

David Britton (davidpbritton) wrote :

Thanks oded-geek, did you happen to see a commit that might be responsible for more stability there, or was it just on a hunch that you tried out the mainline kernel?

Seth Forshee (sforshee) wrote :

Oded, we have a new kernel in focal-proposed (5.4.0-15.18) which includes up to 5.4.21 so it might be worth giving that a try. Otherwise we'll definitely have the 5.4.22 updates in the next kernel we upload.

Oded Arbel (oded-geek) wrote :

I just got the same freeze with the mainline 5.4.22. I'm trying the proposed kernel.

Oded Arbel (oded-geek) wrote :

@David - as far as I know, before reporting kernel issues, Ubuntu users are expected to try a repro with a mainline kernel. I've read (some of) the discussion in the Freedesktop bug report where people have proposed that various patches have possibly solved the problem, so I hit the mainline PPA and got the newest 5.4 :-). And it looked promising for a while.

Currently running 5.4.0-15-generic #18-Ubuntu from focal-proposed, and so far so good.

I have this big with 18.04 LTS and kernel 5.3.0-40-generic

Oded Arbel (oded-geek) wrote :
Download full text (8.1 KiB)

Just got this triggered with 5.4.0-15-generic #18-Ubuntu from focal-proposed.

Logs below.

If you want me to run another kernel, or try patched kernels, I can do that.

Kernel log:

----8<----
Feb 25 10:08:08 vesho kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Feb 25 10:08:08 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 25 10:08:08 vesho kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 25 10:08:08 vesho kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 25 10:08:08 vesho kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 25 10:08:08 vesho kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Feb 25 10:08:16 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 25 10:08:24 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 25 10:08:26 vesho kernel: GpuWatchdog[15034]: segfault at 0 ip 000055af78399e32 sp 00007f8b359414c0 error 6 in chrome[55af74453000+7287000]
Feb 25 10:08:26 vesho kernel: Code: 83 c3 e8 75 e9 41 8b 85 00 01 00 00 85 c0 0f 84 99 00 00 00 48 8d 3d 63 61 4b fb be 01 00 00 00 ba 03 00 00 00 e8 fe 17 a6 fe <c7> 04 25 00>
Feb 25 10:08:28 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
...
Feb 25 10:09:18 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 25 10:09:20 vesho kernel: i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
Feb 25 10:09:20 vesho kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 25 10:09:22 vesho kernel: i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
Feb 25 10:09:22 vesho kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 25 10:09:22 vesho kernel: fbcon: Taking over console
Feb 25 10:09:23 vesho kernel: Console: switching to colour frame buffer device 240x67
Feb 25 10:09:30 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 25 10:09:38 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
...
Feb 25 10:10:32 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Feb 25 10:10:34 vesho kernel: i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
Feb 25 10:10:34 vesho kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 25 10:10:36 vesho kernel: i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
Feb 25 10:10:36 vesho kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Feb 25 10:10:46 vesho kernel: GpuWatchdog[53827]: segfault at 0 ip 000055f286b35e32 sp 00007f396a9a24c0 error 6 in chrome[55f282bef000+7287000]
Feb 25 10:10:46 vesho kernel: Code: 83 c3 e8 75 e9 41 8b 85 00 01 00 00 85 c0 0f 84 99 00 00 00 48 8d 3d 63 61 4b fb be 01 00 00 00 ba 03 00 00 00 e8 fe 17 a6 fe <c7> 04 25 00>
Feb 25 10:10:56 vesho kernel: GpuWatchdog[53920]: segfault at 0 ip 0000555c042dee32 sp 00007f446a7b24c0 error 6 in chrome[555c00398000+7287000]
Feb 25 10:10:56 vesho kernel: Code: 83 c3 e8 75 e9 41 8b 85 00...

Read more...

Andrea Righi (arighi) wrote :

I've uploaded a test kernel here: https://kernel.ubuntu.com/~arighi/LP-1853044/

It's basically 5.4.0-15-generic with the following upstream patches on top:

 8ee36e048c98 drm/i915/execlists: Minimalistic timeslicing
 b1339ecac661 drm/i915/execlists: Always force a context reload when rewinding RING_TAIL

Could you try if it fixes the problem? Thanks!

bjo (bjo81) wrote :

Looking at the changelog of 5.4.0-16.19, this issue shouldn't appear any more, right?

bjo (bjo81) wrote :

Issue persists with kernel from https://kernel.ubuntu.com/~arighi/LP-1853044/ and also with 5.4.0-16.19.

Andrea Righi (arighi) wrote :

There was an off by one error in the patch backported to 5.4.0-16.19 (same with my the test kernel). For those who wants to test it, please try the latest kernel from the unstable ppa (5.4.0-17.21):
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable/+packages

Oded Arbel (oded-geek) wrote :

The problem didn't reproduce with mainline PPA kernel 5.5.6. I'm now running "5.4.0-14-generic #17+lp1853044v1", which is fine so far, but due to the flu and extended weekend - I didn't have enough time to properly stress test this. Likely will happen tomorrow.

bjo (bjo81) wrote :

I can confirm that the issue does not appear with 5.5.6, but I was unable to test 5.4.0-17.21 yet.

Oded Arbel (oded-geek) wrote :
Download full text (21.9 KiB)

Just had an i915 crash on 5.4.0-14-generic #17+lp1853044v1. There appear to be 3 crashes right after the other, I'm not sure what it means. It could be related to the fact that I have a script running that detects these crashes, waits 30 seconds and then restarts the display manager - but if it is related than that means that the additional crashes were when there was just the display manager, and that never happened to me before.

Kernel log:
----8<----
-- Logs begin at Thu 2020-02-20 17:15:40 IST, end at Tue 2020-03-03 12:31:08 IST. --
Mar 03 12:27:26 vesho kernel: i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
Mar 03 12:27:26 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Mar 03 12:27:26 vesho kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 03 12:27:26 vesho kernel: i915 0000:00:02.0: Resetting chip for hang on rcs0
Mar 03 12:27:26 vesho kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 03 12:27:26 vesho kernel: [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
Mar 03 12:27:29 vesho kernel: Asynchronous wait on fence i915:Xorg[1819]:87b8c timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
Mar 03 12:27:29 vesho kernel: Asynchronous wait on fence i915:Xorg[1819]:87b8e timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
Mar 03 12:27:34 vesho kernel: i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
Mar 03 12:27:35 vesho kernel: GpuWatchdog[14346]: segfault at 0 ip 0000561a55c29fa2 sp 00007ff6edf3b4c0 error 6 in chrome[561a51ce3000+7287000]
Mar 03 12:27:35 vesho kernel: Code: 83 c3 e8 75 e9 41 8b 85 00 01 00 00 85 c0 0f 84 99 00 00 00 48 8d 3d f3 60 4b fb be 01 00 00 00 ba 03 00 00 00 e8 be 17 a6 fe <c7> 04 25 00 00 00 00 37 13 00 00 c6 05 fc 76 b9 03 01 80 7d 8f 00
Mar 03 12:27:38 vesho kernel: mce: CPU7: Core temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar 03 12:27:38 vesho kernel: mce: CPU7: Core temperature/speed normal
Mar 03 12:2...

Andrea Righi (arighi) wrote :

@oded-geek sorry, there was an off by one bug in my custom kernel (I've removed it just to make sure nobody is doing other tests with it), could you try the latest kernel from the unstable ppa (5.4.0-17.21)?

https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable/+packages

Thanks!

DooMMasteR (winrootkit-w) wrote :

I can reproduce to get the driver/gpu hanging by playing back 4k h.264 video in celluloid/mpv.

[ 324.680024] i915 0000:00:02.0: GPU HANG: ecode 9:5:0x00000000, hang on rcs0, vcs0
[ 324.681031] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0, vcs0
[ 324.681065] i915 0000:00:02.0: Resetting vcs0 for hang on rcs0, vcs0

the system becomes nearly unresponsive after a short time (2-3 seconds after the first crash

my CPU is an Intel(R) Core(TM) i5-6300U, no external GPU, 16 GB RAM in an HP Elitebook 820 G3, the iGPU has 512MB memory allocated.

DooMMasteR (winrootkit-w) wrote :

I disabled secure-boot and booted kernel 5.5.7 (mainline ppa) which does not exhibit the behavior…

On Wed, Mar 04, 2020 at 10:51:34PM -0000, DooMMasteR wrote:
> I can reproduce to get the driver/gpu hanging by playing back 4k h.264
> video in celluloid/mpv.
>
> [ 324.680024] i915 0000:00:02.0: GPU HANG: ecode 9:5:0x00000000, hang on rcs0, vcs0
> [ 324.681031] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0, vcs0
> [ 324.681065] i915 0000:00:02.0: Resetting vcs0 for hang on rcs0, vcs0
>
> the system becomes nearly unresponsive after a short time (2-3 seconds
> after the first crash
>
> my CPU is an Intel(R) Core(TM) i5-6300U, no external GPU, 16 GB RAM in
> an HP Elitebook 820 G3, the iGPU has 512MB memory allocated.

Which kernel version are you running when you see this?

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Oded Arbel (oded-geek) wrote :

Sorry for the long delay in testing - I've been cooped up at home due to illness, where my ability to test with different display configurations in severely limited.

That being said, I've been running the canonical-kernel-team/unstable 5.4.0-17.21 for about a week now, including two days in the office where I usually repro the problem, and no incidents so far.

I'm willing to call this "fixed".

Haw Loeung (hloeung) wrote :

Been running 5.4.0-17.21, also from the canonical-kernel-team/unstable PPA, for 5 days now and have yet to see any i915 resets or hangs.

tags: added: verification-done-focal
removed: verification-needed-focal
tags: added: verification-needed-focal
removed: verification-done-focal
Haw Loeung (hloeung) wrote :

Will try 5.4.0-18.22 in -proposed this weekend.

Haw Loeung (hloeung) wrote :

5.4.0-18.22 looks good too, been up for 2 days, 10:52.

tags: added: verification-done-focal
removed: verification-needed-focal
Changed in linux (Ubuntu Focal):
status: Incomplete → Confirmed
Launchpad Janitor (janitor) wrote :
Download full text (81.5 KiB)

This bug was fixed in the package linux - 5.4.0-18.22

---------------
linux (5.4.0-18.22) focal; urgency=medium

  * focal/linux: 5.4.0-18.22 -proposed tracker (LP: #1866488)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * Add sysfs attribute to show remapped NVMe (LP: #1863621)
    - SAUCE: ata: ahci: Add sysfs attribute to show remapped NVMe device count

  * [20.04 FEAT] Compression improvements in Linux kernel (LP: #1830208)
    - lib/zlib: add s390 hardware support for kernel zlib_deflate
    - s390/boot: rename HEAP_SIZE due to name collision
    - lib/zlib: add s390 hardware support for kernel zlib_inflate
    - s390/boot: add dfltcc= kernel command line parameter
    - lib/zlib: add zlib_deflate_dfltcc_enabled() function
    - btrfs: use larger zlib buffer for s390 hardware compression
    - [Config] Introducing s390x specific kernel config option CONFIG_ZLIB_DFLTCC

  * [UBUNTU 20.04] s390x/pci: increase CONFIG_PCI_NR_FUNCTIONS to 512 in kernel
    config (LP: #1866056)
    - [Config] Increase CONFIG_PCI_NR_FUNCTIONS from 64 to 512 starting with focal
      on s390x

  * CONFIG_IP_MROUTE_MULTIPLE_TABLES is not set (LP: #1865332)
    - [Config] CONFIG_IP_MROUTE_MULTIPLE_TABLES=y

  * Dell XPS 13 9300 Intel 1650S wifi [34f0:1651] fails to load firmware
    (LP: #1865962)
    - iwlwifi: remove IWL_DEVICE_22560/IWL_DEVICE_FAMILY_22560
    - iwlwifi: 22000: fix some indentation
    - iwlwifi: pcie: rx: use rxq queue_size instead of constant
    - iwlwifi: allocate more receive buffers for HE devices
    - iwlwifi: remove some outdated iwl22000 configurations
    - iwlwifi: assume the driver_data is a trans_cfg, but allow full cfg

  * [FOCAL][REGRESSION] Intel Gen 9 brightness cannot be controlled
    (LP: #1861521)
    - Revert "USUNTU: SAUCE: drm/i915: Force DPCD backlight mode on Dell Precision
      4K sku"
    - Revert "UBUNTU: SAUCE: drm/i915: Force DPCD backlight mode on X1 Extreme 2nd
      Gen 4K AMOLED panel"
    - SAUCE: drm/dp: Introduce EDID-based quirks
    - SAUCE: drm/i915: Force DPCD backlight mode on X1 Extreme 2nd Gen 4K AMOLED
      panel
    - SAUCE: drm/i915: Force DPCD backlight mode for some Dell CML 2020 panels

  * [20.04 FEAT] Enable proper kprobes on ftrace support (LP: #1865858)
    - s390/ftrace: save traced function caller
    - s390: support KPROBES_ON_FTRACE

  * alsa/sof: load different firmware on different platforms (LP: #1857409)
    - ASoC: SOF: Intel: hda: use fallback for firmware name
    - ASoC: Intel: acpi-match: split CNL tables in three
    - ASoC: SOF: Intel: Fix CFL and CML FW nocodec binary names.

  * [UBUNTU 20.04] Enable CONFIG_NET_SWITCHDEV in kernel config for s390x
    starting with focal (LP: #1865452)
    - [Config] Enable CONFIG_NET_SWITCHDEV in kernel config for s390x starting
      with focal

  * Focal update: v5.4.24 upstream stable release (LP: #1866333)
    - io_uring: grab ->fs as part of async offload
    - EDAC: skx_common: downgrade message importance on missing PCI device
    - net: dsa: b53: Ensure the default VID is untagged
    - net: fib_rules: Correctly set table field when table number exceeds 8 bit...

Changed in linux (Ubuntu Focal):
status: Confirmed → Fix Released
Warren (warrenc5) wrote :

Unfortunately I can't upgrade from 5.4.0 so I added the following to /etc/modprobe.d/modesetting.conf

options i915 modeset=1 reset=1

See other options with modinfo i915

This seems to have made the problem occur less frequently.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.