Mir

UnresponsiveClient.does_not_hang_server hangs server

Bug #1586382 reported by Andreas Pokorny on 2016-05-27
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mir
Triaged
High
Unassigned
mir (Ubuntu)
High
Unassigned

Bug Description

10:35:12 [ RUN ] ServerWithoutActiveOutputs.creates_valid_client_surface
10:35:12 [ OK ] ServerWithoutActiveOutputs.creates_valid_client_surface (51 ms)
10:35:12 [----------] 1 test from ServerWithoutActiveOutputs (51 ms total)
10:35:12
10:35:12 [----------] 2 tests from ServerStartup
10:35:12 [ RUN ] ServerStartup.creates_endpoint_on_filesystem
10:35:12 [ OK ] ServerStartup.creates_endpoint_on_filesystem (63 ms)
10:35:12 [ RUN ] ServerStartup.after_server_sigkilled_can_start_new_instance
10:35:13 [ OK ] ServerStartup.after_server_sigkilled_can_start_new_instance (110 ms)
10:35:13 [----------] 2 tests from ServerStartup (173 ms total)
10:35:13
10:35:13 [----------] 1 test from ServerStartupReliability
10:35:13 [ RUN ] ServerStartupReliability.can_start_with_low_entropy
10:35:13 [ OK ] ServerStartupReliability.can_start_with_low_entropy (38 ms)
10:35:13 [----------] 1 test from ServerStartupReliability (38 ms total)
10:35:13
10:35:13 [----------] 3 tests from DebugAPI
10:35:13 [ RUN ] DebugAPI.translates_surface_coordinates_to_screen_coordinates
10:35:13 [ OK ] DebugAPI.translates_surface_coordinates_to_screen_coordinates (56 ms)
10:35:13 [ RUN ] DebugAPI.is_unavailable_when_server_not_started_with_debug
10:35:13 [ OK ] DebugAPI.is_unavailable_when_server_not_started_with_debug (52 ms)
10:35:13 [ RUN ] DebugAPI.is_overrideable
10:35:13 [ OK ] DebugAPI.is_overrideable (53 ms)
10:35:13 [----------] 3 tests from DebugAPI (164 ms total)
10:35:13
10:35:13 [----------] 1 test from UnresponsiveClient
10:35:13 [ RUN ] UnresponsiveClient.does_not_hang_server
10:47:00 Build timed out (after 30 minutes). Marking the build as aborted.
10:47:00 Build was aborted
10:47:00 Archiving artifacts
10:47:00 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
10:47:00 Finished: ABORTED

Happened at least twice this week on krillin.

Related branches

Daniel van Vugt (vanvugt) wrote :

Only happens in this branch AFAIK:
https://code.launchpad.net/~kdub/mir/fix-1577967/+merge/294283

So possibly not a bug in lp:mir

Changed in mir:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for Mir because there has been no activity for 60 days.]

Changed in mir:
status: Incomplete → Expired
Alan Griffiths (alan-griffiths) wrote :

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/device_type=krillin/1397/consoleFull

11:52:24 [ RUN ] UnresponsiveClient.does_not_hang_server
11:52:24 Detected attempt to close a bad file-descriptor.
11:52:24 This usually indicates a double-close bug.
11:52:24 The bad file descriptor was: 31
12:07:39 Build timed out (after 35 minutes). Marking the build as aborted.
12:07:39 Build was aborted
12:07:39 Archiving artifacts
12:07:39 Terminated
12:07:40 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
12:07:40 Finished: ABORTED

Changed in mir:
status: Expired → Confirmed
importance: Undecided → Medium
Daniel van Vugt (vanvugt) wrote :

Also yesterday (from bug 1615512):

06:27:29 [ RUN ] UnresponsiveClient.does_not_hang_server
06:27:29 Detected attempt to close a bad file-descriptor.
06:27:29 This usually indicates a double-close bug.
06:27:29 The bad file descriptor was: 31
06:42:50 Build timed out (after 35 minutes). Marking the build as aborted.
06:42:50 Build was aborted

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/1392/device_type=krillin/consoleFull

Alexandros Frantzis (afrantzis) wrote :

This is blocking CI, raising priority.

Changed in mir:
importance: Medium → Critical
tags: added: ci-blocker
Changed in mir:
assignee: nobody → Alexandros Frantzis (afrantzis)
Daniel van Vugt (vanvugt) wrote :

Yeah confirmed the only way to get a green light from CI right now is to delete the UnresponsiveClient tests:

https://code.launchpad.net/~vanvugt/mir/remove-UnresponsiveClient-test/+merge/303889

Changed in mir:
milestone: none → 0.25.0
Chris Halse Rogers (raof) wrote :

Has anyone been able to reproduce this outside of CI? I tried, and fixed what *I* hit, but I couldn't hit this problem.

Daniel van Vugt (vanvugt) wrote :

I *think* so yes. For the past few days 'make test' has hung indefinitely on acceptance tests on my desktop.

Daniel van Vugt (vanvugt) wrote :

I'm assuming it's the same issue...

Daniel van Vugt (vanvugt) wrote :

hung indefinitely /on more than 50% of attempts/ on acceptance tests on my desktop

Daniel van Vugt (vanvugt) wrote :

Workaround landed so this isn't a CI blocker any more.

Sadly bug 1616291 is still blocking autolandings.

tags: removed: ci-blocker
Changed in mir:
importance: Critical → High

> Has anyone been able to reproduce this outside of CI? I tried, and fixed what *I* hit,
> but I couldn't hit this problem.

I am able to very easily reproduce this with --gtest_repeat=N locally, and I have found the core of the problem. Unfortunately, this bug is non-trivial to fix since it requires refactoring of fd ownership on the client side (which I have already started working on).

Changed in mir:
status: Confirmed → In Progress
Changed in mir:
milestone: 0.25.0 → 0.26.0
Daniel van Vugt (vanvugt) wrote :

Probably not in progress(?)

The offending test was disabled back in August because it was failing too much (and nobody was able to fix it, yet).

Changed in mir:
milestone: 0.26.0 → none
status: In Progress → Triaged
Daniel van Vugt (vanvugt) wrote :

Note to self and others: This is still the second-hottest CI failure despite the test having been disabled to work around it.

Changed in mir:
assignee: Alexandros Frantzis (afrantzis) → nobody
Michał Sawicz (saviq) wrote :

Syncing task from Mir.

Changed in mir (Ubuntu):
importance: Undecided → High
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers