Mir

[regression] mir_acceptance_tests.NestedServer failure in clang CI

Bug #1430000 reported by Kevin DuBois
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Medium
Alan Griffiths
mir (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

https://jenkins.qa.ubuntu.com/job/mir-clang-vivid-amd64-build/1565/parameters/?
and
https://jenkins.qa.ubuntu.com/job/mir-clang-vivid-amd64-build/1567/parameters/?

have both failed in:
        Start 26: mir_acceptance_tests.NestedServer.*
 26/287 Test #26: mir_acceptance_tests.NestedServer.* ...........................................***Exception: SegFault 0.25 sec

Related branches

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

We had other errors appearing around the same time and in a suspiciously similar place:

https://jenkins.qa.ubuntu.com/job/mir-vivid-amd64-ci/1161/consoleFull

Has NestedServer.client_may_connect_to_nested_server_and_create_surface
...

5: ==22094== Thread 16 Mir/Comp:
5: ==22094== Invalid read of size 8
5: ==22094== at 0x4F241AC: mir::frontend::detail::ProtobufResponder::send_response(unsigned int, google::protobuf::Message*, std::initializer_list<std::vector<mir::Fd, std::allocator<mir::Fd> > > const&) (protobuf_responder.cpp:56)
5: ==22094== by 0x4F1E828: mir::frontend::detail::ProtobufMessageProcessor::send_response(unsigned int, mir::protobuf::Buffer*) (protobuf_message_processor.cpp:294)

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

In investigating the above failure I found the suppressions file didn't match the stack trace on my machine (lp:~alan-griffiths/mir/fix-valgrind_suppressions-file/+merge/252413) but I don't see how that can be directly related to this segfault.

Revision history for this message
Kevin DuBois (kdub) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think I got gcc to do this too yesterday. But I could be wrong.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I saw gcc do this today. But sadly had a core from a different run in the way - and couldn't reproduce.

Changed in mir:
importance: Undecided → Medium
milestone: none → 0.13.0
status: New → Confirmed
summary: - mir_acceptance_tests.NestedServer failure in clang CI
+ [regression] mir_acceptance_tests.NestedServer failure in clang CI
tags: added: regression
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The aforementioned send_response function changed recently in r2373, but the change looks safe.

I'm also wondering about r2379, but can't really do any serious bisecting right now because I can't seem to reproduce the bug at all...

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I have a (g++build) core to explore!!!

Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI_getenv (name=0x7f2994d5a4aa "OCKDEV_DIR") at getenv.c:85
85 getenv.c: No such file or directory.
(gdb) bt
#0 __GI_getenv (name=0x7f2994d5a4aa "OCKDEV_DIR") at getenv.c:85
#1 0x00007f2994d569e2 in ?? () from /usr/lib/x86_64-linux-gnu/libumockdev-preload.so.0
#2 0x00007f2994d56f94 in readlink () from /usr/lib/x86_64-linux-gnu/libumockdev-preload.so.0
#3 0x00007f299147a164 in udev_device_new_from_syspath () from /lib/x86_64-linux-gnu/libudev.so.1
#4 0x00007f2991475165 in ?? () from /lib/x86_64-linux-gnu/libudev.so.1
#5 0x00007f299147534e in ?? () from /lib/x86_64-linux-gnu/libudev.so.1
#6 0x00007f2991475843 in udev_enumerate_scan_devices () from /lib/x86_64-linux-gnu/libudev.so.1
#7 0x00007f2993efafcc in mir::udev::Enumerator::scan_devices (this=0x7f29867fa340)
    at /home/alan/display_server/mir3/src/platform/udev/udev_wrapper.cpp:181
#8 0x00007f29948e3cb2 in android::EventHub::scanDevicesLocked (this=0x7f29880099e0)
    at /home/alan/display_server/mir3/3rd_party/android-input/android/frameworks/base/services/input/EventHub.cpp:922
#9 0x00007f29948e32b3 in android::EventHub::getEvents (this=0x7f29880099e0, timeoutMillis=-1, buffer=0x7f2988088958,
    bufferSize=256)
    at /home/alan/display_server/mir3/3rd_party/android-input/android/frameworks/base/services/input/EventHub.cpp:700
#10 0x00007f299491a2df in android::InputReader::loopOnce (this=0x7f2988088800)
    at /home/alan/display_server/mir3/3rd_party/android-input/android/frameworks/base/services/input/InputReader.cpp:292
#11 0x00007f299491c729 in android::InputReaderThread::threadLoop (this=0x7f2988009f10)
    at /home/alan/display_server/mir3/3rd_party/android-input/android/frameworks/base/services/input/InputReader.cpp:862
#12 0x00007f2994903f38 in mir_input::Thread::run(char const*, int, unsigned long)::{lambda()#1}::operator()() const (
    __closure=0x7f298808abc8) at /home/alan/display_server/mir3/3rd_party/android-deps/std/Thread.h:70
#13 0x00007f299491740a in std::_Bind_simple<mir_input::Thread::run(char const*, int, unsigned long)::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x7f298808abc8) at /usr/include/c++/4.9/functional:1700
#14 0x00007f29949172e4 in std::_Bind_simple<mir_input::Thread::run(char const*, int, unsigned long)::{lambda()#1} ()>::operator()() (this=0x7f298808abc8) at /usr/include/c++/4.9/functional:1688
#15 0x00007f29949171ee in std::thread::_Impl<std::_Bind_simple<mir_input::Thread::run(char const*, int, unsigned long)::{lambda()#1} ()> >::_M_run() (this=0x7f298808abb0) at /usr/include/c++/4.9/thread:115
#16 0x00007f2992760e30 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007f29939830a5 in start_thread (arg=0x7f29867fc700) at pthread_create.c:309
#18 0x00007f29921c557d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I have a theory (but not yet a good plan to test it).

1. umockdev isn't thread safe; and
2. nested servers can try to access hardware, not get input from the host server (which I thought they did); and
3. sometime recently we started running acceptance tests under umockdev (which I wasn't aware of); hence,
4. when testing two servers in the same process they both access umockdev (but on different threads)

which sometimes "goes boom"

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

#2 seems to be caused by a combination of mir::DefaultServerConfiguration::the_input_targeter() not checking if input is enabled and test_nested_mir.cpp having been changed so that "--enable-input off" is no longer supplied

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

You mean this? What was wrong with it?

------------------------------------------------------------
revno: 2365 [merge]
author: Robert Carr <email address hidden>
committer: Tarmac
branch nick: development-branch
timestamp: Thu 2015-03-05 01:14:25 +0000
message:
  Improvements to nested server testing.

  Approved by Alan Griffiths, Kevin DuBois, PS Jenkins bot.
------------------------------------------------------------

Changed in mir:
status: Confirmed → Triaged
Changed in mir (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in mir:
status: Triaged → In Progress
assignee: nobody → Alan Griffiths (alan-griffiths)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.13.0

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Still happening with r2409. The fix supposedly landed in r2403.

https://jenkins.qa.ubuntu.com/job/mir-clang-vivid-amd64-build/1726/consoleFull

Changed in mir:
status: Fix Committed → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also, still happening locally with gcc:

The following tests FAILED:
  27 - mir_acceptance_tests.NestedServer.* (SEGFAULT)
Errors while running CTest
Makefile:117: recipe for target 'test' failed

Changed in mir:
assignee: Alan Griffiths (alan-griffiths) → nobody
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

A slightly more detailed failure:

27: [ OK ] NestedServer.receives_lifecycle_events_from_host (24 ms)
27: [ RUN ] NestedServer.client_may_connect_to_nested_server_and_create_surface
 27/291 Test #27: mir_acceptance_tests.NestedServer.* ...........................................***Exception: SegFault 0.30 sec

https://jenkins.qa.ubuntu.com/job/mir-clang-vivid-amd64-build/1827/consoleFull

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I've been re-running the NestedServer suite locally for a full day now without a single glitch. :(

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Here's one from Friday night:
https://jenkins.qa.ubuntu.com/job/mir-clang-vivid-amd64-build/1860/consoleFull

And I did see it occur locally for me (with gcc) on the same day.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.13.0

Changed in mir:
status: Triaged → Fix Committed
Changed in mir:
assignee: nobody → Alan Griffiths (alan-griffiths)
Changed in mir:
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.13.1+15.10.20150520-0ubuntu1

---------------
mir (0.13.1+15.10.20150520-0ubuntu1) wily; urgency=medium

  [ Cemil Azizoglu ]
  * New upstream release 0.13.1 (https://launchpad.net/mir/+milestone/0.13.1)
    - ABI summary: No ABI break. Servers and clients do not need rebuilding.
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 4
      . Mirplatform ABI unchanged at 7
      . Mirserver ABI unchanged at 31
    - Bug fixes:
      . Can't load app purchase UI without a U1 account (LP: #1450377)
      . Crash because uncaught exception in mir::events::add_touch (LP: #1437357)

 -- CI Train Bot <email address hidden> Wed, 20 May 2015 21:20:15 +0000

Changed in mir (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.