UnresponsiveClient.does_not_hang_server hangs server
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Mir |
Triaged
|
High
|
Unassigned | |
| mir (Ubuntu) |
High
|
Unassigned |
Bug Description
10:35:12 [0;32m[ RUN ] [mServerWithou
10:35:12 [0;32m[ OK ] [mServerWithou
10:35:12 [0;32m[----------] [m1 test from ServerWithoutAc
10:35:12
10:35:12 [0;32m[----------] [m2 tests from ServerStartup
10:35:12 [0;32m[ RUN ] [mServerStartu
10:35:12 [0;32m[ OK ] [mServerStartu
10:35:12 [0;32m[ RUN ] [mServerStartu
10:35:13 [0;32m[ OK ] [mServerStartu
10:35:13 [0;32m[----------] [m2 tests from ServerStartup (173 ms total)
10:35:13
10:35:13 [0;32m[----------] [m1 test from ServerStartupRe
10:35:13 [0;32m[ RUN ] [mServerStartu
10:35:13 [0;32m[ OK ] [mServerStartu
10:35:13 [0;32m[----------] [m1 test from ServerStartupRe
10:35:13
10:35:13 [0;32m[----------] [m3 tests from DebugAPI
10:35:13 [0;32m[ RUN ] [mDebugAPI.
10:35:13 [0;32m[ OK ] [mDebugAPI.
10:35:13 [0;32m[ RUN ] [mDebugAPI.
10:35:13 [0;32m[ OK ] [mDebugAPI.
10:35:13 [0;32m[ RUN ] [mDebugAPI.
10:35:13 [0;32m[ OK ] [mDebugAPI.
10:35:13 [0;32m[----------] [m3 tests from DebugAPI (164 ms total)
10:35:13
10:35:13 [0;32m[----------] [m1 test from UnresponsiveClient
10:35:13 [0;32m[ RUN ] [mUnresponsive
10:47:00 Build timed out (after 30 minutes). Marking the build as aborted.
10:47:00 Build was aborted
10:47:00 Archiving artifacts
10:47:00 [WS-CLEANUP] Deleting project workspace.
10:47:00 Finished: ABORTED
Happened at least twice this week on krillin.
Related branches
- Mir CI Bot: Approve (continuous-integration) on 2016-08-25
- Daniel van Vugt: Disapprove on 2016-08-25
-
Diff: 12 lines (+1/-1)1 file modifiedtests/acceptance-tests/CMakeLists.txt (+1/-1)
- Mir CI Bot: Needs Fixing (continuous-integration) on 2016-08-25
- Daniel van Vugt: Approve on 2016-08-25
-
Diff: 12 lines (+1/-1)1 file modifiedtests/acceptance-tests/test_unresponsive_client.cpp (+1/-1)
Launchpad Janitor (janitor) wrote : | #2 |
[Expired for Mir because there has been no activity for 60 days.]
Changed in mir: | |
status: | Incomplete → Expired |
Alan Griffiths (alan-griffiths) wrote : | #3 |
https:/
11:52:24 [0;32m[ RUN ] [mUnresponsive
11:52:24 Detected attempt to close a bad file-descriptor.
11:52:24 This usually indicates a double-close bug.
11:52:24 The bad file descriptor was: 31
12:07:39 Build timed out (after 35 minutes). Marking the build as aborted.
12:07:39 Build was aborted
12:07:39 Archiving artifacts
12:07:39 Terminated
12:07:40 [WS-CLEANUP] Deleting project workspace.
12:07:40 Finished: ABORTED
Changed in mir: | |
status: | Expired → Confirmed |
importance: | Undecided → Medium |
Daniel van Vugt (vanvugt) wrote : | #4 |
Also yesterday (from bug 1615512):
06:27:29 [0;32m[ RUN ] [mUnresponsive
06:27:29 Detected attempt to close a bad file-descriptor.
06:27:29 This usually indicates a double-close bug.
06:27:29 The bad file descriptor was: 31
06:42:50 Build timed out (after 35 minutes). Marking the build as aborted.
06:42:50 Build was aborted
https:/
Alexandros Frantzis (afrantzis) wrote : | #5 |
This is blocking CI, raising priority.
Changed in mir: | |
importance: | Medium → Critical |
tags: | added: ci-blocker |
Changed in mir: | |
assignee: | nobody → Alexandros Frantzis (afrantzis) |
Daniel van Vugt (vanvugt) wrote : | #6 |
Yeah confirmed the only way to get a green light from CI right now is to delete the UnresponsiveClient tests:
https:/
Changed in mir: | |
milestone: | none → 0.25.0 |
Chris Halse Rogers (raof) wrote : | #7 |
Has anyone been able to reproduce this outside of CI? I tried, and fixed what *I* hit, but I couldn't hit this problem.
Daniel van Vugt (vanvugt) wrote : | #8 |
I *think* so yes. For the past few days 'make test' has hung indefinitely on acceptance tests on my desktop.
Daniel van Vugt (vanvugt) wrote : | #9 |
I'm assuming it's the same issue...
Daniel van Vugt (vanvugt) wrote : | #10 |
hung indefinitely /on more than 50% of attempts/ on acceptance tests on my desktop
Daniel van Vugt (vanvugt) wrote : | #11 |
Workaround landed so this isn't a CI blocker any more.
Sadly bug 1616291 is still blocking autolandings.
tags: | removed: ci-blocker |
Changed in mir: | |
importance: | Critical → High |
Alexandros Frantzis (afrantzis) wrote : | #13 |
> Has anyone been able to reproduce this outside of CI? I tried, and fixed what *I* hit,
> but I couldn't hit this problem.
I am able to very easily reproduce this with --gtest_repeat=N locally, and I have found the core of the problem. Unfortunately, this bug is non-trivial to fix since it requires refactoring of fd ownership on the client side (which I have already started working on).
Changed in mir: | |
status: | Confirmed → In Progress |
Changed in mir: | |
milestone: | 0.25.0 → 0.26.0 |
Daniel van Vugt (vanvugt) wrote : | #14 |
Probably not in progress(?)
The offending test was disabled back in August because it was failing too much (and nobody was able to fix it, yet).
Changed in mir: | |
milestone: | 0.26.0 → none |
status: | In Progress → Triaged |
Daniel van Vugt (vanvugt) wrote : | #15 |
Note to self and others: This is still the second-hottest CI failure despite the test having been disabled to work around it.
Changed in mir: | |
assignee: | Alexandros Frantzis (afrantzis) → nobody |
Michał Sawicz (saviq) wrote : | #16 |
Syncing task from Mir.
Changed in mir (Ubuntu): | |
importance: | Undecided → High |
status: | New → Triaged |
Only happens in this branch AFAIK: /code.launchpad .net/~kdub/ mir/fix- 1577967/ +merge/ 294283
https:/
So possibly not a bug in lp:mir