Mir

Mir servers can crash when a client crashes [terminate called after throwing an instance of 'std::out_of_range' what(): map::at]

Bug #1668466 reported by Daniel van Vugt
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Triaged
Low
Unassigned
0.26
Triaged
Low
Unassigned
mir (Ubuntu)
Triaged
Low
Unassigned
qtmir (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

When right clicking in GTK apps (to open a context menu), Mir servers often crash with:

terminate called after throwing an instance of 'std::out_of_range'
  what(): map::at

Thread 6 "Mir/IPC" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe7fff700 (LWP 24062)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
58 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1 0x00007ffff7a483ea in __GI_abort () at abort.c:89
#2 0x00007ffff632156d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff631f316 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff631e2a9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff631ec5d in __gxx_personality_v0 ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff5d7ff43 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00007ffff5d80447 in _Unwind_Resume ()
   from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8 0x00007ffff773c802 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libmirserver.so.43
#9 0x00007ffff7717ab9 in ?? ()

Sounds similar to bug 1656727.

summary: - terminate called after throwing an instance of 'std::out_of_range'
- what(): map::at
+ Mir servers crash when right clicking in GTK apps [terminate called
+ after throwing an instance of 'std::out_of_range' what(): map::at]
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Mir servers crash when right clicking in GTK apps [terminate called after throwing an instance of 'std::out_of_range' what(): map::at]

I wonder if bug 1667645 is related?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It seems we have tried to fix this crash previously as bug 1497128, and declared it fixed.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This crash was first discovered in Unity8 this morning (bug 1668435), but then also found by me in Mir demo servers (I forget which ones).

description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Here's a reliable way to trigger the crash:
  1. Run nautilus under Mir.
  2. Select some icons.
  3. Right click on the selection.
  4. Click away to close the menu.
  5. Try to drag the selected icons.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

And here's a full stack trace of the server crash. The only map-related part I can see is:

#43 0x00007ffff78fd65d in std::map<int, std::shared_ptr<mir::frontend::detail::SocketConnection>, std::less<int>, std::allocator<std::pair<int const, std::shared_ptr<mir::frontend::detail::SocketConnection> > > >::erase (
    this=0x555555aa6538, __x=@0x7fffe6ffc584: 1)
    at /usr/include/c++/6/bits/stl_map.h:981
#44 0x00007ffff78fd1c7 in mir::frontend::detail::Connections<mir::frontend::detail::SocketConnection>::remove (this=0x555555aa6510, id=1)
    at /home/dan/bzr/mir/trunk/src/include/server/mir/frontend/connections.h:48

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Might be a side-effect of the client having crashed:

(nautilus:18313): Gdk-CRITICAL **: gdk_drag_find_window_for_screen: assertion 'GDK_IS_DRAG_CONTEXT (context)' failed

(nautilus:18313): Gdk-CRITICAL **: gdk_drag_motion: assertion 'GDK_IS_DRAG_CONTEXT (context)' failed
Segmentation fault (core dumped)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Hmm, testing with miral-app (zesty, mir 0.26.1, miral 1.2) this "just" crashes the client.

Changed in mir:
assignee: nobody → Alan Griffiths (alan-griffiths)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I was testing both 0.26.1 zesty and lp:mir trunk. Both crashed.

Changed in mir:
status: New → In Progress
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

It took several attempts to reproduce a problem with mir_demo_server. It is easy to reproduce in mir_proving_server.

I suspect the mir_proving_server problem is manifesting in msh::BasicWindowManager::info_for() - probably an attempt to "window manage" a surface that has been deleted (this is what QtMir was doing in bug 1656727). As the resulting exception is allowed to propagate we don't see that stack trace on the crash dump.

This code isn't quite the same in mir_demo_server - which may be why it is harder to reproduce there.

The corresponding code in libmiral is what we're planning to use going forwards. We've done a lot more work on getting that right. As I can't reproduce the problem there I suspect it has been fixed as part of this work.

Reducing the priority as it only affects non-production code.

Changed in mir:
assignee: Alan Griffiths (alan-griffiths) → nobody
importance: Critical → Low
status: In Progress → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It's not low priority. kgunn reported it (and I confirmed it) on Unity8 in zesty.

summary: - Mir servers crash when right clicking in GTK apps [terminate called
- after throwing an instance of 'std::out_of_range' what(): map::at]
+ Mir servers can crash when a client crashes [terminate called after
+ throwing an instance of 'std::out_of_range' what(): map::at]
Changed in mir:
importance: Low → High
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

OK, sorry, I didn't see the mention of Unity8 (or kgunn) in the description. I should follow all the links in comments.

Adding QtMir as it might have a similar cause to lp:1497128.

I'm still unable to reproduce using miral-app - which suggests there are two problems:

1. the deprecated window management code in mir_proving_server
2. racy code in Unity8/QtMir

I still think that /1/ is low priority.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

See also bug 1656727 (?)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

No, in bug 1656727 the exception was thrown by libmiral code [miral::BasicWindowManager::info_for()] and caused by synchronisation issues in QtMir code.

As mentioned in #9 the proving server uses different (and suspect) code, while the demo_server uses me::BasicWindowManager::info_for().

A lot more work has gone into the miral version, which seems to work fine. We've yet to backport this work, but the code see failing here isn't used in production it is low priority.

Changed in mir:
importance: High → Low
Changed in qtmir:
status: New → Invalid
Michał Sawicz (saviq)
affects: qtmir → qtmir (Ubuntu)
Changed in mir:
milestone: 0.27.0 → 0.28.0
Revision history for this message
Michał Sawicz (saviq) wrote :

Syncing task from Mir.

Changed in mir (Ubuntu):
importance: Undecided → Low
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.