libdrm-nouveau2 crashes X with kernel 3.13.0-58

Bug #1477801 reported by Daniel Barrett on 2015-07-24
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Nouveau Xorg driver
Fix Released
Critical
libdrm (Debian)
Fix Released
Unknown
libdrm (Ubuntu)
Undecided
Unassigned

Bug Description

(Apologies for not using ubuntu-bug, but I've been waiting an hour for it to get past "Collecting problem information".)

I upgraded system to kernel 3.16.0-43.58~14.04.1 (amd64) today. X now crashes with the error "nouveau - gpu lockup" whenever I run Google Chrome (google-chrome-stable 44.0.2403.89-1). Big black boxes appear when I open a bunch of tabs and hover the mouse over the tabs. Eventually X crashes, freezing the whole screen with garbage all over it, and syslog contains the "nouveau" errors below.

The problem went away when I downgraded libdrm2 and libdrm-nouveau2 to 2.4.56-1~ubuntu2 (was 2.4.60-2~ubuntu14.04.1). I did this after reading https://bugs.freedesktop.org/show_bug.cgi?id=89842#c19 (see comment #19).

Here's syslog. The "nouveau gpu lockup" error doesn't appear, but it did appear onscreen.

Jul 23 19:22:35 myhost kernel: [ 2568.498288] nouveau E[chrome[7073]] multiple instances of buffer 322 on validation list
Jul 23 19:22:35 myhost kernel: [ 2568.498297] nouveau E[chrome[7073]] validate_init
Jul 23 19:22:35 myhost kernel: [ 2568.498299] nouveau E[chrome[7073]] validate: -22
Jul 23 19:22:35 myhost kernel: [ 2568.514019] nouveau E[ PFIFO][0000:01:00.0] PFIFO: read fault at 0x0008101000 [PAGE_NOT_PRESENT] from (unknown enum 0x00000000)/GPC0/(unknown enum 0x0000000f) on channel 0x007f9af000 [unknown]
Jul 23 19:23:06 myhost kernel: [ 2598.744949] nouveau E[ DRM] GPU lockup - switching to software fbcon
Jul 23 19:23:23 myhost kernel: [ 2616.347931] nouveau E[Xorg[1550]] failed to idle channel 0xcccc0001 [Xorg[1550]]
Jul 23 19:23:38 myhost kernel: [ 2631.335533] nouveau E[Xorg[1550]] failed to idle channel 0xcccc0001 [Xorg[1550]]
Jul 23 19:23:40 myhost kernel: [ 2633.336751] nouveau E[ PFIFO][0000:01:00.0] playlist 0 update timeout
Jul 23 19:23:43 myhost kernel: [ 2635.629793] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:23:47 myhost kernel: [ 2639.921155] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:23:50 myhost vmnet-dhcpd: DHCPINFORM for 192.168.142.131 from 00:0c:29:d5:2d:15 via vmnet8
Jul 23 19:23:50 myhost vmnet-dhcpd: DHCPACK on 192.168.142.131 to 00:0c:29:d5:2d:15 via vmnet8
Jul 23 19:23:51 myhost kernel: [ 2644.212519] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:23:55 myhost kernel: [ 2648.321472] nouveau E[Xorg[1550]] failed to idle channel 0xcccc0000 [Xorg[1550]]
Jul 23 19:23:56 myhost kernel: [ 2648.503871] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:00 myhost kernel: [ 2652.795221] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:04 myhost kernel: [ 2657.086572] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:08 myhost kernel: [ 2661.377922] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:10 myhost kernel: [ 2663.309074] nouveau E[Xorg[1550]] failed to idle channel 0xcccc0000 [Xorg[1550]]
Jul 23 19:24:12 myhost kernel: [ 2665.310067] nouveau E[ PFIFO][0000:01:00.0] playlist 0 update timeout
Jul 23 19:24:12 myhost colord: Automatic remove of icc-ed0e29bb4d99e8caee0ed705188568cc from xrandr-Dell Inc.-DELL 2405FPW-T61335980T0S
Jul 23 19:24:12 myhost colord: Profile removed: icc-ed0e29bb4d99e8caee0ed705188568cc
Jul 23 19:24:12 myhost colord: Profile removed: icc-cc453361e0e5fe47e15ec698dbee0254
Jul 23 19:24:12 myhost colord: device removed: xrandr-Dell Inc.-DELL 2405FPW-T61335980T0S
Jul 23 19:24:13 myhost kernel: [ 2665.669272] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:17 myhost kernel: [ 2669.960633] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:21 myhost kernel: [ 2674.252007] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:26 myhost kernel: [ 2678.543381] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:27 myhost kernel: [ 2680.434907] nouveau E[chrome[7073]] failed to idle channel 0xcccc0000 [chrome[7073]]
Jul 23 19:24:30 myhost kernel: [ 2682.834756] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:34 myhost kernel: [ 2687.126132] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:38 myhost kernel: [ 2691.417509] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:42 myhost kernel: [ 2695.422508] nouveau E[chrome[7073]] failed to idle channel 0xcccc0000 [chrome[7073]]
Jul 23 19:24:43 myhost kernel: [ 2695.708885] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:44 myhost kernel: [ 2697.420975] nouveau E[ PFIFO][0000:01:00.0] channel 4 [chrome[7073]] kick timeout
Jul 23 19:24:44 myhost kernel: [ 2697.423943] nouveau W[ PFIFO][0000:01:00.0] unknown status 0x00000100
Jul 23 19:24:46 myhost kernel: [ 2699.422229] nouveau E[ PFIFO][0000:01:00.0] playlist 0 update timeout
Jul 23 19:24:46 myhost kernel: [ 2699.422279] nouveau ![ PFIFO][0000:01:00.0] unhandled status 0x00000001
Jul 23 19:24:49 myhost kernel: [ 2701.560679] nouveau E[ PFIFO][0000:01:00.0] playlist 0 update timeout
Jul 23 19:24:51 myhost kernel: [ 2703.576392] nouveau E[ PFIFO][0000:01:00.0] playlist 0 update timeout

Created attachment 114766
Nouveau with Gnome 3.16.0 crash log

Using nouveau drivers, when plays with the new legacy tray in Gnome 3.16.0 (open it, close, and reopen again) the entire system hangs and must to restart Gnome Shell & GDM.

Card: GeForce GTS 250

Versions:
Kernel 3.19.3-1-ARCH
xf86-video-nouveau 1.0.11-3
mesa 10.5.2-1

I attach journal log.

Note: I open related bug in GNOME https://bugzilla.gnome.org/show_bug.cgi?id=747115 They advised me that open bug here too.

Created attachment 114831
Attaching journalctl output after gnome 3.16 freeze. Freeze happened @ ~ 23:06

The gnome team thinks I may be hitting the same issue

NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)

Versions:
Kernel 3.19.3-1-ARCH
xf86-video-nouveau 1.0.11-3
mesa 10.5.2-1

Attached my journal log

Somehow gnome-shell is able to convince nouveau to do something very dumb. I didn't even think this was possible... libdrm is supposed to de-dup these, no?

nouveau E[gnome-shell[1773]] multiple instances of buffer 215 on validation list
nouveau E[gnome-shell[1773]] validate_init
nouveau E[gnome-shell[1773]] validate: -22

I'm also getting the same errors since april 1st:

Apr 01 11:07:08 arjen-imac.office.react.nl kernel: nouveau E[gnome-shell[4997]] multiple instances of buffer 228 on validation list
Apr 01 11:36:47 arjen-imac.office.react.nl kernel: nouveau E[gnome-shell[905]] multiple instances of buffer 255 on validation list
Apr 01 13:51:37 arjen-imac.office.react.nl kernel: nouveau E[gnome-shell[2939]] multiple instances of buffer 146 on validation list
Apr 02 12:20:26 arjen-imac.office.react.nl kernel: nouveau E[gnome-shell[2939]] multiple instances of buffer 415 on validation list
Apr 02 17:00:46 arjen-imac.office.react.nl kernel: nouveau E[gnome-shell[895]] multiple instances of buffer 327 on validation list

Just before the 1st crash I upgrade mesa:

[2015-04-01 09:55] [ALPM] upgraded mesa (10.5.1-2 -> 10.5.2-1)

Versions:
Kernel 3.19.2-1-ARCH
xf86-video-nouveau 1.0.11-3
mesa 10.5.2-1
Gnome 3.14.2

So I think this is mesa related, and not related to Gnome 3.16.

I'm guessing all you guys have libdrm-2.4.60 -- can you try downgrading to libdrm-2.4.59?

I have the same issue and can confirm that downgrading
from libdrm-2.4.60 to libdrm-2.4.59 seems to stop the issue from
happening as there are no more hangs.

Thanks,
Rennie

git bisect puts the first bad commit @
commit 5ea6f1c32628887c9df0c53bc8c199eb12633fec
Author: Maarten Lankhorst <email address hidden>
Date: Thu Feb 26 11:54:03 2015 +0100

    nouveau: make nouveau importing global buffers completely thread-safe, with tests
...

ArchLinux bug report (https://bugs.archlinux.org/task/44680) suggests an additional reproduction method "when I move my mouse over VLC's seekbar and it shows a small tooltip to show the time gnome-shell freezes".

My favourite is "run mplayer with vdpau, then move the window". I arrived at that one by accident, but that repros it 100%. No compositors or anything like that.

*** Bug 90201 has been marked as a duplicate of this bug. ***

I just push a commit[1] to libdrm which should fix this issue.

[1] http://cgit.freedesktop.org/mesa/drm/commit/?id=812e8fe6ce46d733c30207ee26c788c61f546294

(In reply to Ben Skeggs from comment #10)
> I just push a commit[1] to libdrm which should fix this issue.
>
> [1]
> http://cgit.freedesktop.org/mesa/drm/commit/
> ?id=812e8fe6ce46d733c30207ee26c788c61f546294

I can confirm that this fixes my repro case (move mplayer vdpau window around). I knew it was something relating to named bo's, so good to see that the fix also involved those.

Created attachment 115697
jounralctl events when gnome-shell freezes

Gnome version is 3.14.4-2-fc21. I had reported this event to the gnome team #749128; they referred me here. I had initiated a download in firefox when this freeze occurred but i have experienced it in other applications

In closer review of the thread above I checked on downgrading libdrm from 2.4.60. In my installation yum tells me I need to also downgrade libdrm-devel and apparently the downgrade version is 2.4.58 rather than 2.4.59. Is that what you recommend?

(In reply to Joe Verreau from comment #13)
> In closer review of the thread above I checked on downgrading libdrm from
> 2.4.60. In my installation yum tells me I need to also downgrade
> libdrm-devel and apparently the downgrade version is 2.4.58 rather than
> 2.4.59. Is that what you recommend?

Joe, are you on the Fedora 22 bet? If so, an update will be going out soon. You can get it immediately from https://admin.fedoraproject.org/updates/FEDORA-2015-7930/libdrm-2.4.61-3.fc22

This is version 2.4.61, which fixes the regression. If you're not on F22, I expect updates will be coming soon. In the meantime, go ahead and downgrade to whatever works.

Matthew, actually I'm on fedora 21 so I will downgrade libdrm, libdrm-devel to 2.4.58 and await the update to 2.4.61 in the normal distribution. thanks.

I too am experiencing the issue described in this ticket. It is affecting 6 machines all running Fedora 21. They generally hang around 2-3 times a day. Is there going to be an update to 2.4.61 pushed for Fedora 21 at some point?

I have a similar bug with

Debian Linux "testing" ("stretch")
GeForce 8400 GS Rev. 3
Linux 3.10-2-amd64
xserver-xorg-video-nouveau 1:1.0.11-1+b1
libdrm-nouveau1a 2.4.40-1~deb7u2
libdrm-nouveau2 2.4.60-3
libgl1-mesa-dri 10.5.7-1
libgl1-mesa-glx 10.5.7-1
Gnome 3.14.0-1

Created attachment 116767
when manipulating Gnome tray

See the error log produced by journalctl when manipulating Gnome tray.

This is fixed by not using libdrm 2.4.60 which was a buggy release on the nouveau end. libdrm 2.4.59 or libdrm 2.4.61 should work fine.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libdrm (Ubuntu):
status: New → Confirmed
Peter Hurley (phurley) wrote :

libdrm-nouveau2 2.4.60 is broken.

Fix is here https://bugs.freedesktop.org/show_bug.cgi?id=89842#c10

Daniel Barrett (dbarrett-m) wrote :

Thank you Peter. Do you know if the 2.4.61 fix can be applied to Ubuntu 14.04 LTS officially? This is one deadly bug.

Peter Hurley (phurley) wrote :

Haven't tried yet; I will tonight.

Peter Hurley (phurley) wrote :

Yep, that bug fix works. I applied it to 2.4.60-2~ubuntu14.04.1, which is the broken version from trusty-updates, and confirm it fixes the observed problem in chrome + libdrm-nouveau2.

I pushed the repackage to PPA @ ppa:phurley/libdrm (or just downgrade to libdrm=2.4.56-1~ubuntu2).

Changed in nouveau:
importance: Unknown → Critical
status: Unknown → Fix Released
Daniel Barrett (dbarrett-m) wrote :

Thank you Peter!!!

I am not familiar with the migration path from a PPA to the official Ubuntu release. Is your fix likely to become part of official 14.04.1 LTS, and if so, how long does that usually take?

Changed in libdrm (Debian):
status: Unknown → Fix Released

... at this point I'm inferring there will not be upgraded versions of libdrm, libdrm-devel for fc21? I did downgrade my laptop from 2.4.60 to 2.4.58 in Jun as noted below. I'm guessing the fix really is to go to fc22. I ask because now my desktop is also experiencing these freeze ups tho not in the frequency that others have reported.

Witold Szczeponik (wsz) wrote :

I get the same crash with Ubuntu 14.04.3 LTS and "libdrm" from ubuntu-updates. Any chances of progressing the patched version from https://bugs.launchpad.net/ubuntu/+source/libdrm/+bug/1477801/comments/6 to the repositories?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.