Mir

Intermittent failure in mir_demo_client_multistream smoke test on krillin

Bug #1560909 reported by Alan Griffiths
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Critical
Alberto Aguirre
mir (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Looks unrelated to the MP.

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/device_type=krillin/225/console

06:50:15 [2016-03-22 06:50:15.228323] GLRenderer: GL max texture size = 4096
06:50:15 [2016-03-22 06:50:15.228437] GLRenderer: GL framebuffer bits: RGBA=8888, depth=24, stencil=8
06:50:18 libevdev error in fix_invalid_absinfo: BUG: Device "mtk-tpd" has invalid ABS_MT_TRACKING_ID rangeQuitting; have a nice day.
06:50:19 [2016-03-22 06:50:19.212164] mirserver: Stopping
06:50:19 I: [FAILED] mir_demo_client_multistream
06:50:19 [timestamp] End : mir_demo_client_multistream 2016-03-22T06:50:19+0000

Seems the client never acknowledges the SIGTERM

Related branches

summary: - Intermittent failure in smoke tests
+ Intermittent failure in mir_demo_client_multistream smoke tests
Revision history for this message
Alexandros Frantzis (afrantzis) wrote : Re: Intermittent failure in mir_demo_client_multistream smoke tests

Seen again here: https://mir-jenkins.ubuntu.com/job/device-runtests-mir/1364/device_type=krillin/consoleFull

I am attaching the crash file from the run so we have it around when the job disappears.

summary: - Intermittent failure in mir_demo_client_multistream smoke tests
+ Intermittent failure in mir_demo_client_multistream smoke tests on
+ krillin
tags: added: krillin
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Intermittent failure in mir_demo_client_multistream smoke tests on krillin

Again today:

Quitting; have a nice day.
[2016-08-29 07:26:44.574426] server_example_test_client.cpp: Terminating client
[2016-08-29 07:26:44.574662] mirserver: Stopping
I: [FAILED] mir_demo_client_multistream

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/1469/device_type=krillin/consoleText

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Changed in mir:
importance: Undecided → Medium
Changed in mir:
status: New → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :
Download full text (6.3 KiB)

Could this be significant?

10:47:10 libevdev error in fix_invalid_absinfo: BUG: Device "mtk-tpd" has invalid ABS_MT_TRACKING_ID range[2017-01-18 10:47:10.473664] evdev-input: Added /dev/input/event0

Full log of test:

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/device_type=krillin/2696/consoleFull

10:47:09 running client mir_demo_client_multistream
10:47:09 [timestamp] Start : mir_demo_client_multistream 2017-01-18T10:47:09+0000
10:47:09 mir_demo_server --test-timeout=3 --test-client mir_demo_client_multistream
10:47:10 [2017-01-18 10:47:10.040258] mirplatform: Found graphics driver: mir:android (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.042185] mirplatform: Found graphics driver: mir:mesa-kms (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.042654] mirplatform: Found graphics driver: mir:mesa-x11 (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.043746] mirplatform: Found graphics driver: mir:stub-graphics (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.044161] mirplatform: Found graphics driver: throw-on-creation (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.050320] mirserver: Starting
10:47:10 [2017-01-18 10:47:10.051084] mircommon: Loading modules from: /usr/lib/arm-linux-gnueabihf/mir/server-platform
10:47:10 [2017-01-18 10:47:10.052114] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/graphics-android.so.11
10:47:10 [2017-01-18 10:47:10.052286] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/graphics-mesa-kms.so.11
10:47:10 [2017-01-18 10:47:10.057123] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/server-mesa-x11.so.11
10:47:10 [2017-01-18 10:47:10.062032] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/graphics-android.so.10
10:47:10 [2017-01-18 10:47:10.066610] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/input-evdev.so.6
10:47:10 [2017-01-18 10:47:10.070492] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/input-evdev.so.5
10:47:10 [2017-01-18 10:47:10.072988] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/graphics-dummy.so
10:47:10 [2017-01-18 10:47:10.075509] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/input-stub.so
10:47:10 [2017-01-18 10:47:10.077303] mircommon: Loading module: /usr/lib/arm-linux-gnueabihf/mir/server-platform/graphics-throw.so
10:47:10 [2017-01-18 10:47:10.078891] mirplatform: Found graphics driver: mir:android (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.079979] mirplatform: Found graphics driver: mir:mesa-kms (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.080149] mirplatform: Found graphics driver: mir:mesa-x11 (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.081069] mirplatform: Found graphics driver: mir:stub-graphics (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.081298] mirplatform: Found graphics driver: throw-on-creation (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.083723] mirserver: Selected driver: mir:android (version 0.26.0)
10:47:10 [2017-01-18 10:47:10.115020] android/server: Found 1 matching egl configs
10:47:10 [2017-01-18 10:47:10.115683] android/server:...

Read more...

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :
Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Bump the priority to critical. We need CI - don't wanna land things by hand at this critical time.

Changed in mir:
importance: Medium → Critical
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :
Kevin DuBois (kdub)
Changed in mir:
assignee: nobody → Kevin DuBois (kdub)
Revision history for this message
Kevin DuBois (kdub) wrote :

Was able to reproduce maybe 1/15 times, if that.
The failure is at this line:
12:48:17 [2017-01-18 12:48:16.973617] server_example_test_client.cpp: Terminating client
where the test harness has to forcibly terminate the client, which probably means a shutdown hang/long wait.

normally, the message is something like:
[2017-01-18 10:01:23.676158] server_example_test_client.cpp: Client exited successfully

When the client fails, everything appears normally on screen, just seems to be a shutdown issue.

Revision history for this message
Kevin DuBois (kdub) wrote :

Also, have only been able to reproduce with the same command the test harness uses:
mir_demo_server --test-timeout=3 --test-client mir_demo_client_multistream

I haven't yet seen any hang during shutdown when manually starting and stopping the client.

Revision history for this message
Kevin DuBois (kdub) wrote :

Just a hunch, but this particular demo client can do a lot of stuff (creating windows, swapping, etc) in between getting sigterm and exiting. I'm guessing that because we can now sleep during swap, maybe we sleep, and the kernel switches over back to the server, which (because of WNOHANG), will go and sigkill the process (even though, the process just got interrupted by the scheduler during shutdown, and would exit, given a bit more time)

Kevin DuBois (kdub)
Changed in mir:
milestone: none → 0.26.0
Revision history for this message
Andreas Pokorny (andreas-pokorny) wrote :

> 10:47:10 libevdev error in fix_invalid_absinfo: BUG: Device "mtk-tpd" has invalid ABS_MT_TRACKING_ID range[2017-01-18 10:47:10.473664] evdev-input: Added /dev/input/event0

This just means that the range reported is invalid - but libinput ignores that and assumes MAX_INT which has no negative consequences.

Changed in mir:
status: Confirmed → In Progress
milestone: 0.26.0 → 1.0.0
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm not so sure this is critical while we have several hotter testsfail bugs, but a fix is imminent anyway...

https://bugs.launchpad.net/mir/+bugs?field.tag=testsfail&orderby=-heat&start=0

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Intermittent failure in mir_demo_client_multistream/mir_demo_client_animated_cursor smoke tests on krillin
summary: - Intermittent failure in mir_demo_client_multistream smoke tests on
- krillin
+ Intermittent failure in
+ mir_demo_client_multistream/mir_demo_client_animated_cursor smoke tests
+ on krillin
Revision history for this message
Alan Griffiths (alan-griffiths) wrote : Re: Intermittent failure in mir_demo_client_multistream/mir_demo_client_animated_cursor/mir_demo_client_eglstateswitcher smoke tests on krillin

13:35:18 Resized to 540x960
13:35:19 [2017-01-20 13:35:19.379624] server_example_test_client.cpp: Terminating client
13:35:19 [2017-01-20 13:35:19.379858] mirserver: Stopping
13:35:19 I: [FAILED] mir_demo_client_eglstateswitcher

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/device_type=krillin/2750/console

summary: Intermittent failure in
- mir_demo_client_multistream/mir_demo_client_animated_cursor smoke tests
- on krillin
+ mir_demo_client_multistream/mir_demo_client_animated_cursor/mir_demo_client_eglstateswitcher
+ smoke tests on krillin
Revision history for this message
Alberto Aguirre (albaguirre) wrote :

I tracked this down to the client hanging waiting for mir_connection_release.

Digging further, the root cause is how surface release is handled:

Thread 1 (T1)
 1. calls MirConnection::release_surface
 2. calls mir_connection_release
 3. Makes rpc channel avoid sending any more messages to the server
 4. Waits for server callback (waithandle signal)
 5. ~MirConnection() runs
 5b waits for outstanding rpc calls

Thread 2 (T2)
 1. calls MirConnection::released(SurfaceRelease data)
 2. signals the wait handle
 3. then proceeds to erase BufferStreamIds and SurfaceIds from the surface_map
 4. ^ that causes BufferStream (or streams) to be destroyed which in turn destroy the associated
    BufferDepository and BufferVault
 5. ~BufferVault checks for disconnection. If no disconnection, starts freeing buffers
 5b. RPC calls are made, so pending callbacks are tracked

So dependending on timing, T2.5b could have sneaked an rpc call before T1.3, in which case, there will be a pending callback that will never be satisified, making T1 hang in 5b.

 -

Changed in mir:
assignee: Kevin DuBois (kdub) → Alberto Aguirre (albaguirre)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

https://mir-jenkins.ubuntu.com/job/device-runtests-mir/device_type=krillin/2847/console

12:59:12 Requesting state: mir_window_state_maximized
12:59:12 Resized to 540x960
12:59:13 [2017-01-25 12:59:13.101915] server_example_test_client.cpp: Terminating client
12:59:13 [2017-01-25 12:59:13.102088] mirserver: Stopping
12:59:13 I: [FAILED] mir_demo_client_eglstateswitcher

Revision history for this message
Alberto Aguirre (albaguirre) wrote :

Separating failures of mir_demo_client_eglstateswitcher to avoid confusion

https://bugs.launchpad.net/mir/+bug/1659999

summary: - Intermittent failure in
- mir_demo_client_multistream/mir_demo_client_animated_cursor/mir_demo_client_eglstateswitcher
- smoke tests on krillin
+ Intermittent failure in mir_demo_client_multistream smoke test on
+ krillin
Changed in mir:
status: In Progress → Fix Committed
milestone: 1.0.0 → 0.26.0
Changed in mir:
status: Fix Committed → Fix Released
Changed in mir (Ubuntu):
status: New → Fix Released
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.