unity8/libmirclient gives up and terminates prematurely with "std::exception::what: disconnected: no new buffers" via ExchangeSemantics::submit()

Bug #1506358 reported by Daniel van Vugt
56
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
High
kevin gunn
Mir
Fix Released
High
Kevin Dubois
0.23
Fix Released
High
Kevin Dubois
mir (Ubuntu)
Fix Released
High
Unassigned
xorg-server (Ubuntu)
Invalid
High
Unassigned

Bug Description

Under heavy resizing (stress testing under Valgrind), my Mir client (Xmir) crashed with:

[1444895332.392862] <ERROR> MirBufferStreamAPI: Caught exception at client library boundary (in mir_buffer_stream_swap_buffers): /build/mir-7io2Aj/mir-0.16.0+15.10.20150921.1/src/client/buffer_stream.cpp(169): Throw in function virtual MirWaitHandle* {anonymous}::ExchangeSemantics::submit(const std::function<void()>&, mir::geometry::Size, MirPixelFormat, int)
Dynamic exception type: boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >
std::exception::what: disconnected: no new buffers

The strange thing is it's only the client that died. The server survived and I can connect new clients to it.

---
Duplicate bug reports now show Unity8 is another such client suffering from this crash. It's occurring on phones and desktops in the wild.

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Mir client died with "std::exception::what: disconnected: no new buffers" via ExchangeSemantics::submit()

BTW, Valgrind found no errors in the client before the crash.

summary: Mir client died with "std::exception::what: disconnected: no new
- buffers"
+ buffers" via ExchangeSemantics::submit()
tags: added: xmir
summary: - Mir client died with "std::exception::what: disconnected: no new
- buffers" via ExchangeSemantics::submit()
+ libmirclient gives up and exits prematurely with "std::exception::what:
+ disconnected: no new buffers" via ExchangeSemantics::submit()
Changed in mir (Ubuntu):
importance: Undecided → High
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: libmirclient gives up and exits prematurely with "std::exception::what: disconnected: no new buffers" via ExchangeSemantics::submit()

It appears the problem happens when the cursor is right on the corner of the window during rapid resize. Sometimes it escapes/enters the window during this time, which causes Xmir to innocently mir_buffer_stream_release_sync() the cursor, and the libmirclient dies:

==8162== Process terminating with default action of signal 6 (SIGABRT)
==8162== at 0x6F7C267: raise (raise.c:55)
==8162== by 0x6F7DEC9: abort (abort.c:89)
==8162== by 0x7F20B7C: __gnu_cxx::__verbose_terminate_handler() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==8162== by 0x7F1E9C5: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==8162== by 0x7F1EA10: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==8162== by 0x7F1EC78: __cxa_rethrow (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==8162== by 0x531D26D: mir::client::rpc::MirProtobufRpcChannel::send_message(mir::protobuf::wire::Invocation const&, mir::protobuf::wire::Invocation const&, std::vector<mir::Fd, std::allocator<mir::Fd> >&) (mir_protobuf_rpc_channel.cpp:238)
==8162== by 0x531CFB8: mir::client::rpc::MirProtobufRpcChannel::call_method(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::MessageLite const*, google::protobuf::MessageLite*, google::protobuf::Closure*) (mir_protobuf_rpc_channel.cpp:210)
==8162== by 0x531B459: mir::client::rpc::DisplayServer::release_buffer_stream(mir::protobuf::BufferStreamId const*, mir::protobuf::Void*, google::protobuf::Closure*) (mir_display_server.cpp:127)
==8162== by 0x52A04B3: MirConnection::release_buffer_stream(mir::client::ClientBufferStream*, void (*)(MirBufferStream*, void*), void*) (mir_connection.cpp:655)
==8162== by 0x52F295C: mir::client::BufferStream::release(void (*)(MirBufferStream*, void*), void*) (buffer_stream.cpp:630)
==8162== by 0x53003E3: mir_buffer_stream_release (mir_buffer_stream_api.cpp:91)

Changed in xorg-server (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mir (Ubuntu):
status: New → Confirmed
Changed in xorg-server (Ubuntu):
status: New → Confirmed
Changed in xorg-server (Ubuntu):
status: Confirmed → Invalid
Changed in mir:
importance: High → Critical
Changed in mir (Ubuntu):
importance: High → Critical
Changed in mir:
status: New → Confirmed
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Naively I would like to know; if there are "no new buffers" and if that's a blocker, then why is the client not just waiting for new buffers? Dying instead seems like an overreaction.

Changed in canonical-devices-system-image:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → kevin gunn (kgunn72)
milestone: none → ww04-2016
Revision history for this message
kevin gunn (kgunn72) wrote :

just a note that bug 1532607 has been made a duplicate of this.
there's quite a bit of traffic & information on bug 1532607

summary: - libmirclient gives up and exits prematurely with "std::exception::what:
- disconnected: no new buffers" via ExchangeSemantics::submit()
+ libmirclient gives up and terminates prematurely with
+ "std::exception::what: disconnected: no new buffers" via
+ ExchangeSemantics::submit()
Revision history for this message
Kevin DuBois (kdub) wrote : Re: libmirclient gives up and terminates prematurely with "std::exception::what: disconnected: no new buffers" via ExchangeSemantics::submit()

This is expected behavior though when a nested server has its host server die (so I'd say this as described doesnt really describe a "bug")... The bug that this one was made a duplicate of seems to be the root cause.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

In the description I mentioned the server does not die at all. Only the client dies and the server keeps working (can reconnect to it).

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Dropped to High. If this bug only occurs /in the lab/ now then it's not a blocker for the phone any more.

Changed in mir:
importance: Critical → High
Changed in mir (Ubuntu):
importance: Critical → High
Changed in canonical-devices-system-image:
milestone: ww04-2016 → ww08-2016
kevin gunn (kgunn72)
Changed in canonical-devices-system-image:
importance: Critical → High
kevin gunn (kgunn72)
Changed in canonical-devices-system-image:
milestone: ww08-2016 → 11
kevin gunn (kgunn72)
Changed in canonical-devices-system-image:
milestone: 11 → backlog
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Still crashing the same way in xenial with Mir 0.21.0+16.04.20160330-0ubuntu1

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

==3369== Process terminating with default action of signal 6 (SIGABRT)
==3369== at 0x6EB8418: raise (raise.c:54)
==3369== by 0x6EBA019: abort (abort.c:89)
==3369== by 0x7DE184C: __gnu_cxx::__verbose_terminate_handler() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3369== by 0x7DDF6B5: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3369== by 0x7DDF700: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3369== by 0x7DDF968: __cxa_rethrow (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3369== by 0x52C1821: ??? (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3369== by 0x52C6F36: ??? (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3369== by 0x52C0280: mir_buffer_stream_release (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3369== by 0x52C030C: mir_buffer_stream_release_sync (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3369== by 0x43A737: xmir_unrealize_cursor (xmir-cursor.c:83)
==3369== by 0x502857: AnimCurUnrealizeCursor (animcur.c:282)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

And again from:

==3839== Process terminating with default action of signal 6 (SIGABRT)
==3839== at 0x6EB8418: raise (raise.c:54)
==3839== by 0x6EBA019: abort (abort.c:89)
==3839== by 0x7DE184C: __gnu_cxx::__verbose_terminate_handler() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3839== by 0x7DDF6B5: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3839== by 0x7DDF700: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3839== by 0x7DDF968: __cxa_rethrow (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==3839== by 0x52C1821: ??? (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3839== by 0x52C6F36: ??? (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3839== by 0x52C0280: mir_buffer_stream_release (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3839== by 0x52C030C: mir_buffer_stream_release_sync (in /usr/lib/x86_64-linux-gnu/libmirclient.so.9)
==3839== by 0x43A737: xmir_unrealize_cursor (xmir-cursor.c:83)
==3839== by 0x55A1A0: FreeCursor (cursor.c:124)

Revision history for this message
Kevin DuBois (kdub) wrote :

Hmm, if the server isn't dying, perhaps we're getting a false disconnection signal from the rpc code?

Revision history for this message
Kevin DuBois (kdub) wrote :

If there's an issue in the transport in mclr::MirProtobufRpcChannel::send_message during the sending of the message, it will trigger the disconnection code.

If the client happens to be mid-mir_buffer_stream_swap_buffers at the same time that the the connection error trips the disconnection notification, we'll see that error. In the case of a true disconnection, the only reasonable thing the client code can do is to throw, as it can no longer appropriately fulfil the mir_buffer_stream_swap_buffers requirement.

Also, iirc, there are certain things that the client can try to do that will cause the server to disconnect it, maybe its an issue like that?

Changed in mir:
milestone: none → 0.24.0
Revision history for this message
Kevin DuBois (kdub) wrote :

I targeted to 0.24, should probably have a look, especially if we're getting error reports from the 'field'

Kevin DuBois (kdub)
Changed in mir:
milestone: 0.24.0 → 0.25.0
Changed in mir:
milestone: 0.25.0 → none
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It wasn't till today, but finally there's a bug report of this happening in the wild on desktop -> bug 1603739

Changed in mir:
milestone: none → 0.25.0
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Another duplicate today.

This bug in libmirclient is now causing Untiy8 to fail start-up sometimes, and black-screen.

tags: added: black-screen unity8-desktop
summary: - libmirclient gives up and terminates prematurely with
+ unity8/libmirclient gives up and terminates prematurely with
"std::exception::what: disconnected: no new buffers" via
ExchangeSemantics::submit()
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Although the offending exception might have been a regression from:

------------------------------------------------------------
revno: 2870 [merge]
author: Kevin DuBois <email address hidden>
committer: Tarmac
branch nick: development-branch
timestamp: Fri 2015-08-21 14:52:44 +0000
message:
  client: Do some internal refactoring to separate out mcl::BufferStream's logic that ties it to the BufferDepository. If the client gets a buffer at creation, it will use the exchange based system. If not, it will keep the current behavior of throwing an exception.
  .

  Approved by PS Jenkins bot, Alan Griffiths, Chris Halse Rogers, Alexandros Frantzis.
------------------------------------------------------------

We probably don't have to fix it directly... Because Mir 0.24.0 aims to avoid this old code path for good, and you shouldn't ever hit it again.

Changed in mir:
milestone: 0.25.0 → 0.24.0
status: Confirmed → Fix Committed
assignee: nobody → Kevin Dubois (kevin-dubois05)
Changed in mir (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This bug was fixed in the package mir - 0.23.5+16.10.20160729-0ubuntu1

Changed in mir (Ubuntu):
status: Triaged → Fix Released
Changed in canonical-devices-system-image:
status: Confirmed → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The correct milestone for this bug is 13 (OTA-13) in canonical-devices-system-image.

Changed in canonical-devices-system-image:
milestone: backlog → 13
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Still happening in Mir 0.24.0 using NBS. See: bug 1627311

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.