xorg crash/freeze when Chrome uses WebGL, caused by: GPU HANG: ... chrome ... reason: Hang on render ring, action: reset

Bug #1710051 reported by Andrew Montalenti
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I am running a Lenovo X1C 4th Generation on Ubuntu 17.04. With the stock xserver-xorg-video-intel, I am able to reliably reproduce a complete Xorg hang -- that occasionally results in a full system hang, requiring reboot -- simply by opening a 360 photograph (using WebGL under the hood) on Facebook.com running in Chrome with hardware acceleration enabled. This is using modesetting and thus glamor.

A similar hang also happens with this driver using certain software leveraging the GPU or hardware acceleration, for example the proprietary Zoom Video app. But since the Facebook example works very reliably (it crashes without fail), it has been good for testing/reproduction.

On a stock Ubuntu 17.04, this crash would result in a total system hang requiring a reboot. However, by enabling CTRL+ALT+BACKSPACE on my Xorg version using GNOME tweak, I was able to induce the crash, press CTRL+ALT+BACKSPACE, and end up in a virtual terminal. From there, I could inspect dmesg (to find the message reported in the summary, which has also been reported elsewhere on bug trackers in various forms). This is the more precise dmesg entry:

GPU HANG: ecode 9:0:0x86dffffd, in chrome [24053], reason: Hang on render ring, action: reset

I could also capture the state of the GPU/drm error by looking in /sys/class/drm/card0/error. That has been attached to this ticket.

Some interesting notes about this:

- I could make Chrome not crash by turning off hardware acceleration in its advanced settings. In this case, chrome://gpu would show it is no longer using hardware acceleration. It would thus software accelerate WebGL on Facebook.com. And thus, no hang/crash.

- I upgraded to the xserver-xorg-video-intel version that is in this PPA: https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers -- after this, the problem went away, definitively. I have only been running this version for a day, but have had no crashes, and the Facebook WebGL reproduction case is no longer a reproduction case.

My suspicion is that this is a bug deep in the intel driver and its interaction with certain GPUs, since the hang has happened in more than one userland program, and seems to be "repaired" by recent updates in the above-linked PPA.

Revision history for this message
Andrew Montalenti (andrewm-l) wrote :
description: updated
Revision history for this message
Andrew Montalenti (andrewm-l) wrote :

Also, a couple of days later, I experienced another full system hang. I was using GPU-accelerated Chrome at the time, but it was when a new page was loading. So, the newer PPA may not have totally corrected this issue, perhaps it just moved it around somehow.

Revision history for this message
John Lenton (chipaca) wrote :

The package description of xserver-xorg-video-intel says,

 The use of this driver is discouraged if your hw is new enough (ca.
 2007 and newer). You can try uninstalling this driver and let the
 server use it's builtin modesetting driver instead.

did you try removing the package entirely?

Revision history for this message
Andrew Montalenti (andrewm-l) wrote :

@chipaca Thanks for the suggestion. Interesting. I see a reddit thread about this here:

https://www.reddit.com/r/archlinux/comments/4cojj9/it_is_probably_time_to_ditch_xf86videointel/

I might try that suggestion again. Reading the Intel Graphics wiki page from Arch Linux has the following admonishment:

"Some (Debian & Ubuntu, Fedora, KDE) recommend not installing the xf86-video-intel driver, and instead falling back on the modesetting driver for fourth generation and newer GPUs. See [1], [2], Xorg#Installation, and modesetting(4). However, the modesetting driver can cause problems such as Chromium Issue 370022."

from https://wiki.archlinux.org/index.php/intel_graphics

Part of the reason I switched to xserver-xorg-video-intel, though, is because I was seeing atrocious animation performance with the default, as well as corruption issues in Chrome. So, I had assumed the built-in driver was somehow problematic. Seems like we need an expert to weigh in here on the right move for modern Ubuntu and modern Intel GPUs.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Confirmed
Revision history for this message
Andrew Montalenti (andrewm-l) wrote :

For the help of other users, I just want to mention that in Ubuntu 17.04, I ended up switching back to stable xorg and *removing* xserver-xorg-video-intel, as suggested in some online documentation, in order to force the modesetting driver to run. I'd see modesetting and glamoregl modules loading in the Xorg.0.log file.

Then, I disabled hardware acceleration in Chrome, using the official setting available in Chrome settings. (Not via a flag.) Search for "acceleration" in chrome://settings and then un-check that option.

With these two corrective actions, I seem to have a stable machine, even when running Chrome day-in day-out for a long time in gnome-shell. I also can't force a crash with Facebook 3D images any longer. This is probably the right set of recommendations for users with my Lenovo X1C model or similar Skylake & GPU hardware.

Revision history for this message
Maarten Jacobs (maarten256) wrote :

I have been seeing this similar issue for some time but hadn't gotten to looking into it. My system was also running with the subject driver:

xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1.2 amd64 X.Org X server -- Intel i8xx, i9xx display driver

My graphics card:

01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)

For experimentation sake, I switched to NVidia proprietary drivers... To see if that helps any (for my case).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.