Mir

[regression] Clients are crashing (SIGSEGV) on mir_connect() failure

Bug #1358191 reported by John Lenton
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Critical
Alexandros Frantzis
0.6
Won't Fix
Critical
Unassigned
0.7
Fix Released
Critical
Alexandros Frantzis
mir (Ubuntu)
Fix Released
Critical
Unassigned
mir (Ubuntu RTM)
Fix Released
Undecided
Unassigned

Bug Description

On image 195 on mako,

phablet@ubuntu-phablet:~$ mirping
Segmentation fault (core dumped)
phablet@ubuntu-phablet:~$ mirscreencast -n1
Segmentation fault (core dumped)

Related branches

Revision history for this message
John Lenton (chipaca) wrote :
Revision history for this message
John Lenton (chipaca) wrote :

Just checked the image 198 on the x86 emulator, same issue.

Changed in mir:
milestone: none → 0.7.0
importance: Undecided → High
Niklas Wenzel (nikwen)
Changed in mir:
status: New → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I've now tested:
  Mir 0.7 source
  Mir 0.6 source
  Mako image 179 (Mir 0.5.1)
  Mako image 199 (Mir 0.6.1)

None of them have crashing mirping/mirscreencast. Maybe it's fixed or I'm just not holding it right? :)

P.S. See also related bugs: https://bugs.launchpad.net/mir/+bugs?field.tag=screencast

tags: added: screencast
Changed in mir:
status: Confirmed → Incomplete
milestone: 0.7.0 → none
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Spoke too soon. My mistake was to log in as root (no crash). I can make it crash if run as phablet.

Changed in mir:
milestone: none → 0.7.0
status: Incomplete → Confirmed
summary: - mirping, mirscreencast segfault
+ mirping, mirscreencast segfault if run as non-root
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: mirping, mirscreencast segfault if run as non-root

Same stack trace too:

(gdb) bt
#0 0xb6efbb5c in mir::client::DefaultConnectionConfiguration::~DefaultConnectionConfiguration() () from /usr/lib/arm-linux-gnueabihf/libmirclient.so.8
#1 0xb6eeeae4 in mir_connect ()
   from /usr/lib/arm-linux-gnueabihf/libmirclient.so.8
#2 0xb6eeebe0 in mir_connect_sync ()
   from /usr/lib/arm-linux-gnueabihf/libmirclient.so.8
#3 0x00010a84 in main ()

Changed in mir:
status: Confirmed → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Still can't reproduce the problem using the latest source. Although based on when the regression appeared in touch images, I suspect this change which first appeared in Mir 0.6.1:

0.6:
------------------------------------------------------------
revno: 1846
author: Alexandros Frantzis <email address hidden>
committer: Cemil Azizoglu <email address hidden>
branch nick: 0.6
timestamp: Thu 2014-08-14 14:39:38 -0500
message:
    client: Rework mechanism to override Mir client functions.
------------------------------------------------------------

0.7:
------------------------------------------------------------
revno: 1834 [merge]
author: Alexandros Frantzis <email address hidden>
committer: Tarmac
branch nick: development-branch
timestamp: Mon 2014-08-11 21:56:45 +0000
message:
  client: Rework mechanism to override Mir client functions.

  Approved by PS Jenkins bot, Chris Halse Rogers, Kevin DuBois, Alan Griffiths.
------------------------------------------------------------

summary: - mirping, mirscreencast segfault if run as non-root
+ [regression] mirping, mirscreencast segfault if run as non-root
tags: added: regression
summary: - [regression] mirping, mirscreencast segfault if run as non-root
+ [regression] mirping, mirscreencast segfault
summary: - [regression] mirping, mirscreencast segfault
+ [regression] Simple clients segfault: mirping, mirscreencast
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [regression] Clients are crashing (SIGSEGV) on connect()

You can also see this bug causing phone apps to crash in the wild:
https://errors.ubuntu.com/problem/e1019855ecc2433fe4166e3a07f14aafa1de47ac

summary: - [regression] Simple clients segfault: mirping, mirscreencast
+ [regression] Clients are crashing (SIGSEGV) on connect()
Changed in mir:
importance: High → Critical
summary: - [regression] Clients are crashing (SIGSEGV) on connect()
+ [regression] Clients are crashing (SIGSEGV) on mir_connect()
tags: added: rtm14
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote : Re: [regression] Clients are crashing (SIGSEGV) on mir_connect()

Another duplicate bug #1358874

Changed in mir:
assignee: nobody → Daniel van Vugt (vanvugt)
status: Triaged → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Confirmed better by duplicate bugs; comment #6 does seem to identify the source of the regression.

Changed in mir:
assignee: Daniel van Vugt (vanvugt) → nobody
status: In Progress → Triaged
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Daniel,

Your comment #4 suggests that you were able to repro. But then comment #6 says you can't. So have you repro'ed?

I tried running as phablet but could not repro.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I could reproduce it a week ago (not reliably). But when I started actually working on the bug on Friday however I could not reproduce it at all. Even valgrind/helgrind are not suggesting any errors. It's very slippery. But at least the commit the problem started at seems clear. Unfortunately we can't simply revert the offending commit as other important fixes (LP: #1353867) seem to depend on it somewhat.

tags: added: touch-2014-08-28
Changed in mir (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

And today I can reproduce it again :)

ii libmirclient8:armhf 0.6.1+14.10.20140814-0ubuntu1 armhf Display server for Ubuntu - client library

current build number: 203
device name: mako
channel: devel
alias: ubuntu-touch/utopic
last update: 2014-08-25 03:13:36
version version: 203
version ubuntu: 20140821.1
version device: 20140811.1

Changed in mir:
status: Triaged → In Progress
assignee: nobody → Daniel van Vugt (vanvugt)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Ugh. Ignore comment #6. I've finally got my phone in a state where I can reproduce the crash and it keeps happening even with r1834 reverted. Same problem, but apparently r1834 didn't cause it.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Download full text (3.4 KiB)

Here's the valgrind output of the error from earlier in the day when I could reproduce it (on mako):

==8894== Invalid read of size 4
==8894== at 0x4867B78: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release() (shared_ptr_base.h:190)
==8894== by 0x4866B39: std::__weak_count<(__gnu_cxx::_Lock_policy)2>::~__weak_count() (shared_ptr_base.h:755)
==8894== by 0x4866279: std::__weak_ptr<mir::client::ClientPlatformFactory, (__gnu_cxx::_Lock_policy)2>::~__weak_ptr() (shared_ptr_base.h:1350)
==8894== by 0x4866293: std::weak_ptr<mir::client::ClientPlatformFactory>::~weak_ptr() (shared_ptr.h:467)
==8894== by 0x48662D3: mir::CachedPtr<mir::client::ClientPlatformFactory>::~CachedPtr() (cached_ptr.h:28)
==8894== by 0x486E3F7: mir::client::DefaultConnectionConfiguration::~DefaultConnectionConfiguration() (in /home/phablet/mir/boom/lib/libmirclient.so.8)
==8894== by 0x486E441: mir::client::DefaultConnectionConfiguration::~DefaultConnectionConfiguration() (default_connection_configuration.h:46)
==8894== by 0x4855E8F: std::default_delete<mir::client::ConnectionConfiguration>::operator()(mir::client::ConnectionConfiguration*) const (unique_ptr.h:76)
==8894== by 0x4855941: std::unique_ptr<mir::client::ConnectionConfiguration, std::default_delete<mir::client::ConnectionConfiguration> >::~unique_ptr() (unique_ptr.h:236)
==8894== by 0x4855071: (anonymous namespace)::DefaultMirConnectionAPI::connect(char const*, char const*, void (*)(MirConnection*, void*), void*) (mir_connection_api.cpp:75)
==8894== by 0x48552AB: mir_connect (mir_connection_api.cpp:140)
==8894== by 0x485530D: mir_connect_sync (mir_connection_api.cpp:156)
==8894== Address 0x5a92bf4 is not stack'd, malloc'd or (recently) free'd
==8894==
==8894==
==8894== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==8894== Access not within mapped region at address 0x5A92BF4
==8894== at 0x4867B78: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release() (shared_ptr_base.h:190)
==8894== by 0x4866B39: std::__weak_count<(__gnu_cxx::_Lock_policy)2>::~__weak_count() (shared_ptr_base.h:755)
==8894== by 0x4866279: std::__weak_ptr<mir::client::ClientPlatformFactory, (__gnu_cxx::_Lock_policy)2>::~__weak_ptr() (shared_ptr_base.h:1350)
==8894== by 0x4866293: std::weak_ptr<mir::client::ClientPlatformFactory>::~weak_ptr() (shared_ptr.h:467)
==8894== by 0x48662D3: mir::CachedPtr<mir::client::ClientPlatformFactory>::~CachedPtr() (cached_ptr.h:28)
==8894== by 0x486E3F7: mir::client::DefaultConnectionConfiguration::~DefaultConnectionConfiguration() (in /home/phablet/mir/boom/lib/libmirclient.so.8)
==8894== by 0x486E441: mir::client::DefaultConnectionConfiguration::~DefaultConnectionConfiguration() (default_connection_configuration.h:46)
==8894== by 0x4855E8F: std::default_delete<mir::client::ConnectionConfiguration>::operator()(mir::client::ConnectionConfiguration*) const (unique_ptr.h:76)
==8894== by 0x4855941: std::unique_ptr<mir::client::ConnectionConfiguration, std::default_delete<mir::client::ConnectionConfiguration> >::~unique_ptr() (unique_ptr.h:236)
==8894== by 0x4855071: (anonymous namespace)::Def...

Read more...

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

The core issue seems to be that in this scenario the platform shared library is unloaded before the configuration object is being destroyed. The configuration object is holding a weak pointer to a shared pointer object (shared_ptr for the client platform factory) that was instantiated inside the platform library, and when the library is unloaded, type information and vtable for that shared pointer type vanishes too. When we try to destroy the weak_ptr we end up accessing memory that is invalid.

I have proposed an MP that spikes a fix for this: https://code.launchpad.net/~afrantzis/mir/fix-1358191-connect-segv-spike/+merge/232108 by keeping the library loaded for as long as needed.

Changed in mir:
milestone: 0.7.0 → 0.8.0
Changed in mir:
assignee: Daniel van Vugt (vanvugt) → Alexandros Frantzis (afrantzis)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
summary: - [regression] Clients are crashing (SIGSEGV) on mir_connect()
+ [regression] Clients are crashing (SIGSEGV) on mir_connect() failure
Changed in mir:
milestone: 0.8.0 → 0.7.0
Changed in mir:
milestone: 0.7.0 → 0.8.0
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Fix committed to lp:mir/0.7 at revision 1880, scheduled for release in Mir 0.7.0

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package mir - 0.7.0+14.10.20140829-0ubuntu1

---------------
mir (0.7.0+14.10.20140829-0ubuntu1) utopic; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.7.0 (https://launchpad.net/mir/+milestone/0.7.0)
    - Enhancements:
      . Test suite: Reworked mechanism to override Mir client functions
      . Demo shell: Detect custom rendering (decorations) to make it
        compatible with overlay optimizations
      . Make sure to preserve fd resources until the end of the sending
        of the message
      . Add test cases and script for tracking changes to the new ABIs:
        libmircommon, libmirplatform
      . Symbols file for libmirplatform
      . Symbols file for libmircommon
      . Symbols file for libmirserver
      . Various improvements to the SessionMediator test
      . Various build related improvements
      . Print testcase output during package build
      . Abort test when InProcessServer startup fails
      . Link the integration and unit tests against the server objects
      . Add a document detailing the useful tests to run and the useful
        logs to collect when troubleshooting a new android chipset
      . Enable motion event resampling and prediction for a more responsive
        touch experience.
    - ABI summary: Servers need rebuilding, but clients do not
      . Mirclient ABI unchanged at 8
      . Mircommon ABI bumped to 1
      . Mirplatform ABI bumped to 2
      . Mirserver ABI bumped to 25
    - API changes
      . Deleted function - frontend::Shell::create_surface_for(). If you have
        the std::shared_ptr<frontend::Session> session, you can just do
        session->create_surface(params) instead to get a SurfaceId
    - Bug fixes:
      . Ensure we process lifecycle events before the nested server is torn
        down (LP: #1353465)
      . Fix race in InputTestingServerConfiguration (LP: #1354446)
      . Fix fd leaks in prompt session frontend code and tests (LP: #1353461)
      . Detect the additional things the demo shell draws on the renderable
        list and avoid calling the optimized post function if they are being
        drawn (LP: #1348330)
      . Client: Fix SIGTERM dispatch in our default lifecycle event handler
        (LP: #1353867)
      . DemoRenderer: Don't try to create a texture of width zero.
        (LP: #1358210)
      . Fix CI failures (LP: #1358698)
      . Fix build failure: "variable ‘rc’ set but not used" which happens in
        release mode when NDEBUG is set (LP: #1358625)
      . Only enumerate exposed input surfaces to avoid delivering events to
        occluded surfaces (LP: #1359264)
      . Android: do not post driver cancelled buffers (LP: #1359406)
      . Client: Ensure our platform library stays loaded for as long as it is
        needed by other objects (LP: #1358191)
      . Examples: Register the DemoCompositor with the Scene to properly
        process visibility events (LP: #1359487)
      . Mir_demo_client_basic: Don't assert on user errors like failing to
        connect to a Mir server (LP: #1331958)
      . Tests: Explicitly depend on GMock target to avoid build races
        (LP: #1362646)

  [ Ubuntu dai...

Read more...

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in mir:
milestone: 0.8.0 → none
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.9 KiB)

This bug was fixed in the package mir - 0.8.1+15.04.20150112.2~rtm-0ubuntu1

---------------
mir (0.8.1+15.04.20150112.2~rtm-0ubuntu1) 14.09; urgency=medium

  [ Daniel van Vugt ]
  * Bug fix release 0.8.1 (https://launchpad.net/mir/+milestone/0.8.1)
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 2
      . Mirplatform ABI unchanged to 3
      . Mirserver ABI bumped to 26.1 (due to breakage in LP: #1355173)
    - Fixes bugs:
      . Switching windows with a Trusted Prompt Session active loses the
        trusted prompt session (LP: #1355173)
      . a prompt session with an invalid application pid should be an error
        (LP: #1377968)
  * mir_demo_client_basic: Don't assert on user errors like failing to
    connect to a Mir server. It's much more useful to tell the user what
    they did wrong than to automatically generate error reports and
    duplicate apport bugs. (LP: #1331958) . (LP: #1331958)
  * Enable "Project Butter" motion event resampling and prediction for a
    more responsive touch experience. .
  * When the server disconnects unexpectedly, give the client SIGHUP
    instead of the present SIGTERM. The former is more appropriate
    because it indicates: "Hangup detected on controlling terminal or
    death of controlling process" [signal(7)] whereas SIGTERM is
    reserved for more polite user-requested shutdowns. .
  * Fix the common/shared source tree layout inconsistency. If it goes
    in libmircommon then it should be called "common" and not "shared":
  * Move mir::time::*Clock out of libmirserver and into libmircommon. It
    will soon be used by libmirclient.
  * Bump version to 0.8.0 as Mir 0.7.0 was released today.
  * Remove dead code "kill_client_processes()" from mir_test_framework.
  * Work around spurious SIGKILL delivered to clients by Valgrind during
    tests. This is why CI is failing randomly (LP: #1364772) Actually
    this workaround is nothing new. A quick grep shows we already use
    the same hack in 13 other locations under tests/. . (LP: #1364772)
  * Privatise (don't install) headers that are unused by clients, QtMir
    or USC. This reduces the size of the public include/ tree from 377
    to 121 files, and reduces the number of those that get installed
    from 208 to 121.
  * Bump the mircommon ABI to 2. We actually broke it in the Mir 0.7.0
    release with symbols.map and didn't notice in time (LP: #1364890)
    (LP: #1364890)
  * Restore support for gcc-4.8/trusty (LP: #1366134) (LP: #1366134)
  * Publish an internal header that QtMir recently started using:
    mir/input/input_channel.h (LP: #1365934) . (LP: #1365934)
  * Relax dependencies on libmirplatform2. Any binary package that uses
    libmirplatform2 should already be ABI compatible with any version of
    libmirplatform2. Assuming we maintain our ABIs correctly...
    Minimising exact version requirements should minimise future package
    upgrade problems. .
  * Privatise more headers -- these are the ones unused by any external
    projects but are used internally by the server code in examples/
    only.
  * Introducing a generic client per...

Changed in mir (Ubuntu RTM):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.