Mir/Unity8/USC crashes/freezes on nouveau (nv50) in pushbuf_kref() especially with multiple monitors, webbrowser-app or system settings

Bug #1553328 reported by Alexander Langanke on 2016-03-04
82
This bug affects 14 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Critical
Unassigned
Mir
Triaged
High
Unassigned
Nouveau Xorg driver
Unknown
Unknown
Unity System Compositor
High
Unassigned
libdrm (Ubuntu)
Critical
Unassigned
Nominated for Xenial by Alberto Salvia Novella
mesa (Ubuntu)
Critical
Unassigned
Nominated for Xenial by Alberto Salvia Novella
mir (Ubuntu)
High
Unassigned
Nominated for Xenial by Alberto Salvia Novella
qtmir (Ubuntu)
High
Gerry Boland
Nominated for Xenial by Alberto Salvia Novella
qtubuntu (Ubuntu)
High
Gerry Boland
Nominated for Xenial by Alberto Salvia Novella

Bug Description

Unit8 froze up while I was trying to open system settings.

ProblemType: Crash
DistroRelease: Ubuntu 16.04
Package: unity8 8.11+16.04.20160216.1-0ubuntu1
ProcVersionSignature: Ubuntu 4.4.0-9.24-generic 4.4.3
Uname: Linux 4.4.0-9-generic x86_64
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
Date: Fri Mar 4 19:12:54 2016
ExecutablePath: /usr/bin/unity8
InstallationDate: Installed on 2015-05-10 (299 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
ProcCmdline: unity8
SegvAnalysis:
 Segfault happened at: 0x7f58d568706c: mov 0x8(%rsi),%edx
 PC (0x7f58d568706c) ok
 source "0x8(%rsi)" (0x00000008) not located in a known VMA region (needed readable region)!
 destination "%edx" ok
 Stack memory exhausted (SP below stack segment)
SegvReason: reading NULL VMA
Signal: 11
SourcePackage: unity8
StacktraceTop:
 ?? () from /usr/lib/x86_64-linux-gnu/libdrm_nouveau.so.2
 ?? () from /usr/lib/x86_64-linux-gnu/libdrm_nouveau.so.2
 ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
 ?? () from /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
 ?? () from /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1
Title: unity8 crashed with SIGSEGV
UpgradeStatus: Upgraded to xenial on 2015-11-07 (118 days ago)
UserGroups: adm autopilot cdrom dip lpadmin plugdev sambashare sudo

Related branches

StacktraceTop:
 pushbuf_kref () from /tmp/apport_sandbox_ZyMo9z/usr/lib/x86_64-linux-gnu/libdrm_nouveau.so.2
 pushbuf_validate () from /tmp/apport_sandbox_ZyMo9z/usr/lib/x86_64-linux-gnu/libdrm_nouveau.so.2
 nv50_flush () from /tmp/apport_sandbox_ZyMo9z/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
 st_glFlush () from /tmp/apport_sandbox_ZyMo9z/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
 dri2_make_current () from /tmp/apport_sandbox_ZyMo9z/usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1

Changed in unity8 (Ubuntu):
importance: Undecided → Medium
tags: removed: need-amd64-retrace

Given the stack trace seems more a crash in the nouveau driver.

Changed in unity8 (Ubuntu):
status: New → Incomplete
information type: Private → Public
summary: - unity8 crashed with SIGSEGV
+ unity8 crashed with SIGSEGV on nouveau, in eglMakeCurrent() ...
+ nv50_flush() ... pushbuf_kref()
Changed in mir (Ubuntu):
status: New → Invalid
Changed in mir:
status: New → Invalid
affects: unity8 (Ubuntu) → qtmir (Ubuntu)
David Planella (dpm) on 2016-04-08
tags: added: unity8-desktop

We should sanity-check the correctness of the eglMakeCurrent call in Mir src/server/graphics/nested/display_buffer.cpp:62

But other than that, it seems to be purely a nouveau bug.

Daniel van Vugt (vanvugt) wrote :

Done. There doesn't appear to be any way that Mir itself can screw up that call.

It's possible qtmir might be messing up with an incorrect 'this' pointer to the Mir DisplayBuffer. But I think much more likely that it's a nouveau driver bug. If it wasn't then we'd be seeing similar crashes with intel and radeon, but we don't.

Changed in libdrm (Ubuntu):
importance: Undecided → High
Daniel van Vugt (vanvugt) wrote :

Needs retesting. The symptoms might have changed: bug 1620934

Confirmed by duplicate/similar bug 1623507

summary: - unity8 crashed with SIGSEGV on nouveau, in eglMakeCurrent() ...
- nv50_flush() ... pushbuf_kref()
+ Mir crashes on nouveau (nv50) in pushbuf_kref()
Changed in qtmir (Ubuntu):
status: Incomplete → Invalid
Changed in mesa (Ubuntu):
importance: Undecided → High
Changed in libdrm (Ubuntu):
status: New → Confirmed
Changed in mesa (Ubuntu):
status: New → Confirmed
Changed in canonical-devices-system-image:
status: New → Confirmed
importance: Undecided → High
tags: added: nouveau
Daniel van Vugt (vanvugt) wrote :

Upstream says nouveau is known to not support multi-threaded rendering and will crash:
   https://bugs.freedesktop.org/show_bug.cgi?id=92438
So that completely explains yesterday's crash in bug 1623507.

Hopefully that vaguely explains this one too.

summary: - Mir crashes on nouveau (nv50) in pushbuf_kref()
+ Mir crashes on nouveau (nv50) in pushbuf_kref() especially with multiple
+ monitors
Changed in canonical-devices-system-image:
importance: High → Critical
Changed in libdrm (Ubuntu):
importance: High → Critical
Changed in mesa (Ubuntu):
importance: High → Critical
Changed in mesa (Ubuntu):
status: Confirmed → Invalid
Changed in mir (Ubuntu):
importance: Undecided → Critical
Changed in qtmir (Ubuntu):
importance: Medium → Critical
Changed in libdrm (Ubuntu):
status: Confirmed → Triaged

I guess even with a single display, multi-threaded apps like the web browser could still trigger this nouveau crash or similar by rendering from multiple threads.

There's a comment here suggesting the same crash might also happen in some processes like webbrowser-app even under Unity7:
  https://bugs.launchpad.net/ubuntu/+source/unity8-desktop-session/+bug/1595238/comments/15
which further supports the theory.

Based on upstream's comment https://bugs.freedesktop.org/show_bug.cgi?id=92438#c39 I would expect nouveau to continue to crash under both multi-monitor and other multi-threaded rendering conditions (like webbrowser-app?).

summary: - Mir crashes on nouveau (nv50) in pushbuf_kref() especially with multiple
- monitors
+ Mir/Unity8 crashes on nouveau (nv50) in pushbuf_kref() especially with
+ multiple monitors
summary: Mir/Unity8 crashes on nouveau (nv50) in pushbuf_kref() especially with
- multiple monitors
+ multiple monitors or opening the web browser app
summary: Mir/Unity8 crashes on nouveau (nv50) in pushbuf_kref() especially with
- multiple monitors or opening the web browser app
+ multiple monitors, webbrowser-app or system settings
summary: - Mir/Unity8 crashes on nouveau (nv50) in pushbuf_kref() especially with
- multiple monitors, webbrowser-app or system settings
+ Mir/Unity8 crashes/freezes on nouveau (nv50) in pushbuf_kref()
+ especially with multiple monitors, webbrowser-app or system settings
kevin gunn (kgunn72) on 2017-03-06
Changed in canonical-devices-system-image:
assignee: nobody → Stephen M. Webb (bregma)
milestone: none → u8c-1

I'm not sure this is useful to assign to that milestone. The bug and the fix are in the nouveau driver.

Although each GL client could work around it by moving all their GL rendering to a single thread, we really shouldn't have to do that just to support nouveau.

Daniel van Vugt (vanvugt) wrote :

See above comment

Changed in canonical-devices-system-image:
status: Confirmed → Incomplete
Gerry Boland (gerboland) wrote :

I have Qt configured (in QtUbuntu/QtMir) to use multi-threaded GL rendering, so we're probably hitting Nouveau's limitations here.

As workaround, I can add code to QtUbuntu/QtMir use single-threaded GL for Nouveau.

If this is easily reproduced, can you try

initctl set-env --global QSG_RENDER_LOOP=basic

and see if everything is more stable? If so, the workaround should do the trick

Changed in qtmir (Ubuntu):
status: Invalid → Incomplete
Gerry Boland (gerboland) wrote :

Actually, might be easier/more-reliable just to edit /etc/environment and add "QSG_RENDER_LOOP=basic" and restart.

Daniel van Vugt (vanvugt) wrote :

Regardless of whether you have a single GPU or multiple GPUs, using OpenGL is a matter of "send it commands quickly and then forget (the GPU will complete them later)". So I'm not sure why QSG would bother with multi-threading GL at all. Similarly Mir doesn't really need to do the multi-threaded compositing it does, but most days it doesn't hurt us.

summary: - Mir/Unity8 crashes/freezes on nouveau (nv50) in pushbuf_kref()
+ Mir/Unity8/USC crashes/freezes on nouveau (nv50) in pushbuf_kref()
especially with multiple monitors, webbrowser-app or system settings
summary: - Mir/Unity8/USC crashes/freezes on nouveau (nv50) in pushbuf_kref()
- especially with multiple monitors, webbrowser-app or system settings
+ nouveau (nv50) crashes/freezes in pushbuf_kref()
summary: - nouveau (nv50) crashes/freezes in pushbuf_kref()
+ Mir/Unity8/USC crashes/freezes on nouveau (nv50) in pushbuf_kref()
+ especially with multiple monitors, webbrowser-app or system settings
Daniel van Vugt (vanvugt) wrote :

Per comment #17 it might actually be easiest to build a workaround in Mir for now... Either disable secondary compositor threads, or build a SingleThreadedCompositor to replace MultiThreadedCompositor.

In my spare time I'm working on an idea where post() could be made non-blocking. So that would eliminate the need for Mir's compositor to be threaded, and nouveau should then work (as well as it ever did for X).

Changed in mir:
status: Invalid → Confirmed
importance: Undecided → Medium
tags: added: multimonitor
Changed in qtmir (Ubuntu):
importance: Critical → Medium
status: Incomplete → Confirmed
Daniel van Vugt (vanvugt) wrote :

Even easier workaround for Mir/USC: Just default to clone mode instead of side-by-side, which is the problem in today's duplicate bug 1672793. Mir's legacy clone mode will at least ensure there is only one compositor thread, so nouveau should be stable then.

Changed in unity-system-compositor:
importance: Undecided → Medium
status: New → Confirmed
Changed in canonical-devices-system-image:
status: Incomplete → Triaged
Changed in canonical-devices-system-image:
milestone: u8c-1 → u8c-2
Daniel van Vugt (vanvugt) wrote :

I made an attempt at a workaround for nouveau crashes today (and discovered more nouveau bugs).

I can confirm with mir-demos that forcing the compositor into single-threaded mode makes it stable. The only problem is the unity-system-compositor option for doing this gets ignored (Unity8 overrides the display config to suit itself when it sees a second display). So you can't apply the workaround yourself.

So yes, medium term we could work around some of the nouveau stability issues by hacking Mir/USC/Unity8 to only use single threaded rendering. But that requires code changes in multiple places.

I suggest a short-term workaround that should do the trick is:
  1. Unplug all but one monitor; and
  2. Add to /etc/environment: QSG_RENDER_LOOP=basic

Sadly I can't even test that much myself, because of bug 1677125.

Gerry Boland (gerboland) wrote :

I've an ancient NVidia box at home, can try it out. I'm attaching patches for qtubuntu/qtmir to force Qt to use single threaded GL on nouveau.

Changed in qtubuntu (Ubuntu):
status: New → In Progress
assignee: nobody → Gerry Boland (gerboland)
Changed in qtmir (Ubuntu):
assignee: nobody → Gerry Boland (gerboland)
status: Confirmed → In Progress
Changed in qtubuntu (Ubuntu):
importance: Undecided → High
Changed in qtmir (Ubuntu):
importance: Medium → High
Changed in mir (Ubuntu):
importance: Critical → High
status: Invalid → Triaged
Changed in unity-system-compositor:
importance: Medium → High
status: Confirmed → Triaged
Changed in mir:
importance: Medium → High
status: Confirmed → Triaged

The nouveau isn't threadsafe and the Qt renderer uses threaded GL by default.

We don't know if Mir's default renderer would also exhibit the same issue when running multimonitor. (The Mir renderer uses a thread per monitor, but this is a simpler usage pattern than Qt and may not manifest problems.)

Changed in mir:
status: Triaged → Incomplete
importance: High → Undecided
Changed in mir (Ubuntu):
status: Triaged → Incomplete
importance: High → Undecided
Changed in canonical-devices-system-image:
assignee: Stephen M. Webb (bregma) → nobody
Daniel van Vugt (vanvugt) wrote :

Alan, that question was already answered in comment #20. Single threaded rendering works.

Changed in mir:
status: Incomplete → Triaged
importance: Undecided → High
Changed in mir (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → High

Daniel, you are saying comment #20 states that Mir's thread-per-monitor rendering manifests the problem? (I don't have any nouveau based kit to test on.)

Daniel van Vugt (vanvugt) wrote :

Well, yes. The problem description before that covers the fact that Mir's multi-threaded compositor manifests the problem (assuming you have multiple monitors). And comment #20 states that avoiding multiple threads (a single headed or cloned config) avoids the instability.

I don't think the problem description is clear that both the Mir and Unity8 compositors are affected. (It can be read as applying to the stack including Unity8.) Thanks for clarifying.

Daniel van Vugt (vanvugt) wrote :

Actually, the snapshotting/screencasting thread could also trigger it. If that's still a thing...?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.