Mir

Intermittent hang in ClientPidTestFixture.authorizer_may_prevent_connection_of_clients test

Bug #1201436 reported by Alexandros Frantzis
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Alexandros Frantzis
mir (Ubuntu)
High
Alexandros Frantzis

Bug Description

I noticed some Intermittent hangs in the ClientPidTestFixture.authorizer_may_prevent_connection_of_clients tests. They are present on both Android and the desktop, although they are much more easily reproducible on an Android device. Run each test with --gtest_repeat=N to catch this (and even then may need multiple runs).

Related branches

Revision history for this message
kevin gunn (kgunn72) wrote :

this should be fixed for 13.10

Changed in mir:
importance: Undecided → High
importance: High → Critical
status: New → Triaged
Revision history for this message
Robert Carr (robertcarr) wrote :

I'll investigate!

Changed in mir:
assignee: nobody → Robert Carr (robertcarr)
Revision history for this message
Robert Carr (robertcarr) wrote :

I bet the solution to this is contained in client-focus-notifications. Let's see though.

Changed in mir:
importance: Critical → High
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Problem analysis:

In mir_default_connect() we call connection->connect() which sets up a callback to connection::connected() (and then to the client callback).

Having done this the function can propagate an exception caught in mir_default_connect() - where a different connection object is created and returned to the client.

This means that there is nothing to stop the client continuing to a point where the above mentioned callback touches obsolete objects (e.g. the client "context") because the client code is syncing against the second connection object.

So, occasionally, the callback occurs after the test teardown starts - and chaos ensues.

Changed in mir:
assignee: Robert Carr (robertcarr) → Alan Griffiths (alan-griffiths)
Changed in mir:
status: Triaged → In Progress
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:~mir-team/mir/development-branch at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
milestone: none → phone-v1-freeze
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Reopened. I'm still getting this with the latest development-branch today;

[ RUN ] ClientPidTestFixture.authorizer_may_prevent_connection_of_clients
unknown file: Failure
C++ exception with description "Timeout while waiting for child to change state" thrown in TearDown().
[ FAILED ] ClientPidTestFixture.authorizer_may_prevent_connection_of_clients (60006 ms)

Changed in mir:
status: Fix Committed → Triaged
assignee: Alan Griffiths (alan-griffiths) → nobody
tags: added: testsfail
Changed in mir:
milestone: phone-v1-freeze → 0.0.17
Changed in mir:
assignee: nobody → Alexandros Frantzis (afrantzis)
Changed in mir:
status: Triaged → In Progress
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Problem analysis for new issue:

When a client connection is rejected by the server, the connection is closed. Depending on the timing, the client will detect the connection break either while sending the request to the server, or when it is trying to read back the response.

In the first case, an exception is thrown while calling the server.connect() method, and an error connection is returned to the client (see mir_default_connect() in mir_client_library.cpp). The client can then successfully call mir_connection_release() on the error connection.

In the second case, an exception is thrown while trying to read back the server response, causing the client stack to force completion of pending RPC calls (the connect() call). The client ends up with an invalid connection object, but not an error connection object. Calling mir_connection_release() on this object blocks, because, although an exception is thrown by the server.disconnect() method, the disconnect_wait_handle is not updated. This happens because in MirSocketRpcChannel::notify_disconnected() the pending RPC calls are not forced to complete, since the channel has already been marked as 'disconnected' by the previous connect failure.

This bug is an instance of calling a function that talks to the server after a connection break has been detected in a previous call. Note that if the break is first detected while calling the funcion, then the call doesn't block, since this is the first time we call MirSocketRpcChannel::notify_disconnected() and the pending RPC calls are forced to complete.

See https://code.launchpad.net/~afrantzis/mir/fix-1201436-more/+merge/191784 for a suggested solution.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:~mir-team/mir/development-branch at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Changed in mir (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in mir:
status: Fix Committed → Fix Released
Changed in mir (Ubuntu):
assignee: nobody → Alexandros Frantzis (afrantzis)
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.1 KiB)

This bug was fixed in the package mir - 0.1.0+14.04.20131028-0ubuntu1

---------------
mir (0.1.0+14.04.20131028-0ubuntu1) trusty; urgency=low

  [ Daniel van Vugt ]
  * Bump version 0.1.0
  * Add method for testing if Rectangle::contains(Rectangle), which is
    the basis of any occlusion detection. (LP: #1227739)
  * Add support for traversing the Scene from front surface to back.
    This is required for occlusion detection at least (coming soon). .
    (LP: #1227739)
  * Optimization: Turn off blending for surfaces that are not blendable.
    On some systems this can have a noticeable performance benefit.
  * Avoid rendering surfaces that are fully hidden by other surfaces.
    This is particularly important for mobile device performance. (LP:
    #1227739) . (LP: #1227739)
  * Remove orphaned tags, which appear to have come from the Compiz
    project (!?) Add tags for the most recent releases up to 0.0.16. No
    files changed, only tags.
  * Fix significant performance issues LP: #1241369 / LP: #1241371, and
    probably more(!) Added regression test to catch such regressions and
    revert the offending commit r1049. (LP: #1241369, #1241371)

  [ Brandon Schaefer ]
  * When Scroll events come in we don't keep around the android vscroll
    hscroll data. Store it now. (LP: #1233089)

  [ Albert Astals ]
  * Fix mismatched free() / delete / delete [] reported by valgrind

  [ Alexandros Frantzis ]
  * server: Extend server status (formerly pause/resume) listener to
    report "started" events This change is needed by users of
    libmirserver, so they can properly synchronize external interactions
    with the server. (LP: #1239876)
  * graphics,examples: Don't enable more outputs than supported when
    changing the display configuration. (LP: #1217877)
  * client: Allow clients to call API functions after a connection break
    has been detected When a client tries to call an API function after
    a connection break has been detected in a previous API call, the
    client blocks in the new call. This happens because in
    MirSocketRpcChannel::notify_disconnected() the pending RPC calls are
    not forced to complete, since the channel has already been marked as
    'disconnected' by the failure in the previous call. Note that if the
    break is first detected while calling an API function, then that
    call doesn't block, since this is the first time we call
    MirSocketRpcChannel::notify_disconnected() and the pending RPC calls
    are forced to complete. This commit solves this problem by always
    forcing requests to complete when a communication failure occurs,
    even if a disconnection has already been handled. This is preferred
    over the alternative of manually calling the completion callback in
    a try-catch block when calling an RPC method because of: 1.
    Correctness: In case the communication problem first occurs in that
    call, the callback will be called twice, once by
    notify_disconnected() and once manually. 2. Consistency: The
    callback is called from one place regardless of whether the
    communication problem is first detected during that call or not.
    (LP: #1201436)
  * graphics: Improve si...

Read more...

Changed in mir (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers