Mir

[regression] Mir servers just segfault just after "Selected driver:" instead of reporting exceptions

Bug #1528135 reported by Daniel van Vugt
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Andreas Pokorny
0.18
Won't Fix
High
Unassigned
mir (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Mir server segfaults when trying to report exception - "Error opening DRM device".

I could not get any workable backtrace, but by deduction was able to figure out that was the one being thrown. Unfortunately Mir segfaults before it can report the error to the user. This does not happen in Mir 0.17.1 at least but does happen with lp:mir ...

Try this from ssh while you're logged in to X, or on the login screen:

$ sudo bin/mir_demo_server --platform-graphics-lib=lib/server-modules/graphics-mesa-kms.so.7
MIR_CLIENT_PLATFORM_PATH=bin/../lib/client-modules/
MIR_SERVER_PLATFORM_PATH=bin/../lib/server-modules/
LD_LIBRARY_PATH=bin/../lib
exec=bin/mir_demo_server.bin
[1450687750.973798] mirserver: Starting
[1450687750.974124] mirserver: Selected driver: mesa-kms (version 0.19.0)
$ echo $?
139

When launched within an X session, after opening /dev/dri/card0 the kms platform attempts to drmSetInterfaceVersion(drm_fd, {1,4,-1,-1)) that fails with errno 13 Permission denied. Since mir has not switched vt. Further devices nodes are tried after that. Since none of them is a card opening the devices fails. Now that failures is thrown as an exception. Stack unwinding back up to run_mir makes mir unload the graphics platform.

Then a rethrow of the exception leads to a segmentation fault.
#0 0x00007ffff78dfc3b in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff78e01bf in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff764c203 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3 0x00007ffff764c50d in _Unwind_Resume_or_Rethrow () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#4 0x00007ffff78e095c in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff67661f0 in mir::report_exception (out=...) at /home/andreas/mir/mir/src/server/report_exception.cpp:30
#6 0x00007ffff677ef36 in mir::Server::run (this=this@entry=0x7fffffffe1d0) at /home/andreas/mir/mir/src/server/server.cpp:404
#7 0x00007ffff6b1dbb5 in main (argc=2, argv=0x7fffffffe408) at /home/andreas/mir/mir/examples/server_example.cpp:112
#8 0x0000000000400e8e in main (argc=2, argv=0x7fffffffe408) at /home/andreas/mir/mir/examples/mir_demo_server_loader.cpp:40

It seems to try to access unwind / throw location related information. But those are no longer present since the platform library is already unloaded.

Tags: regression

Related branches

summary: - Mir server segfaults when trying to report exception - "Error opening
- DRM device"
+ [regression] Mir server segfaults when trying to report exception -
+ "Error opening DRM device"
Changed in mir:
milestone: none → 0.19.0
assignee: nobody → Daniel van Vugt (vanvugt)
status: New → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [regression] Mir server segfaults when trying to report exception - "Error opening DRM device"

Bisected. The regression came in at:

------------------------------------------------------------
revno: 3070 [merge]
author: Andreas Pokorny <email address hidden>
committer: Alberto Aguirre <email address hidden>
branch nick: devel
timestamp: Mon 2015-11-02 12:52:36 -0600
message:
  Switch to UniqueModulePtr in graphics platform creation symbols

  With this the lifetime of the graphics platform shared module is tied to the lifetime of the graphics::Platform instance used by the server. So we can get rid of the global static SharedLibrary in mirserver. This is then identical to the lifetime tracking used for input platforms.

  The platform plays factory for Display, PlatformIPCOperations and GraphicsBufferAllocator. These instances are still created as plain shared ptrs. Currently DisplayServer guarantees that the Platform outlives those objects, if we dont want to guarantee that destruction ordering we could also use UniqueModulePtr for those - but that requires a slightly larger rework in integration tests.

  This change also required removing enable_shared_from_this from the kms version of mir::graphics::mesa::Platform. Capturing a UniqueModulePtr<T> with a shared_ptr<T> works, and also when enable_shared_from_this is used in T. It does not work when enable_shared_from_this is only added by a class derived from T.
------------------------------------------------------------

Changed in mir:
assignee: Daniel van Vugt (vanvugt) → nobody
status: In Progress → Triaged
description: updated
Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

Rethrowing boost exception created from a shared unloaded module seems to be problematic.

description: updated
description: updated
Changed in mir:
assignee: nobody → Andreas Pokorny (andreas-pokorny)
Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

What makes this more interesting is that it happens right in make_module_ptr...

I try to attach a shared library to the boost exception to keep it alive.

summary: - [regression] Mir server segfaults when trying to report exception -
- "Error opening DRM device"
+ [regression] Mir server segfaults when trying to report exception in the
+ driver module - "Error opening DRM device"
summary: - [regression] Mir server segfaults when trying to report exception in the
+ [regression] Mir server segfaults on startup just after "Selected
+ driver: mesa-kms". It's actually trying to report exception in the
driver module - "Error opening DRM device"
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Oh, I just wasted time re-bisecting a duplicate bug and found the regression happened in the same change (the first time it landed):

------------------------------------------------------------
revno: 3066 [merge]
author: Andreas Pokorny <email address hidden>
committer: Alberto Aguirre <email address hidden>
branch nick: trunk
timestamp: Thu 2015-10-29 18:37:56 -0500
message:
  Switch to UniqueModulePtr in graphics platform creation symbols

  With this the lifetime of the graphics platform shared module is tied to the lifetime of the graphics::Platform instance used by the server. So we can get rid of the global static SharedLibrary in mirserver.

  The platform plays factory for Display, PlatformIPCOperations and GraphicsBufferAllocator. These instances are still created as plain shared ptrs. Currently DisplayServer guarantees that the Platform outlives those objects, if we dont want to guarantee that destruction ordering we could also use UniqueModulePtr for those - but that requires a slightly larger rework in integration tests.
------------------------------------------------------------

summary: - [regression] Mir server segfaults on startup just after "Selected
- driver: mesa-kms". It's actually trying to report exception in the
- driver module - "Error opening DRM device"
+ [regression] Mir servers just segfault just after "Selected driver:"
+ instead of reporting exceptions
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Changed in mir:
assignee: Andreas Pokorny (andreas-pokorny) → Daniel van Vugt (vanvugt)
Changed in mir:
assignee: Daniel van Vugt (vanvugt) → Andreas Pokorny (andreas-pokorny)
status: Triaged → In Progress
no longer affects: mir/0.18
Changed in mir:
milestone: 0.19.0 → 0.20.0
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.20.0

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
milestone: 0.20.0 → 0.19.0
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

mir (0.19.0+16.04.20160128-0ubuntu1) xenial; urgency=medium

Changed in mir:
status: Fix Committed → Fix Released
Changed in mir (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.