Mir

mir-ubuntu-vivid-armhf-ci fails consistently

Bug #1407863 reported by Cemil Azizoglu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Medium
Alexandros Frantzis
mir (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

GLibMainLoopTest.propagates_exception_from_signal_handler [ FAILED ]
GLibMainLoopTest.propagates_exception_from_fd_handler [ FAILED ] GLibMainLoopTest.propagates_exception_from_server_action [ FAILED ]
GLibMainLoopTest.can_be_rerun_after_exception [ FAILED ]
GLibMainLoopAlarmTest.propagates_exception_from_alarm [ FAILED ]
 GLibMainLoopForkTest.handles_signals_when_created_in_forked_process [FAILED]

Tags: testsfail

Related branches

Revision history for this message
Cemil Azizoglu (cemil-azizoglu) wrote :

Happens consistently in mir 0.10 MP : https://code.launchpad.net/~mir-team/mir/development-branch/+merge/245589

Doesn't/didn't happen in the silo.

Changed in mir:
importance: Undecided → High
tags: added: testsfail
Changed in mir:
milestone: none → 0.10.0
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

The problem is also seen with a "null" changeset: https://code.launchpad.net/~alan-griffiths/mir/test/+merge/245639

This would appear to indicate this is a consequence of CI changes since the last release.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The bug appears to have been around a while. The same happened in November, threatening to block the release of 0.9.0. So we just merged it manually:
   https://code.launchpad.net/~mir-team/mir/0.9/+merge/242146

Changed in mir:
importance: High → Medium
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Reproduced the bug locally, I think. Just edit the cross-compile script to force -DCMAKE_BUILD_TYPE=Coverage and then try running the resulting tests on another host (e.g. a phone). Then I get failures that are simple gcov path lookup failures (those paths don't exist on the test host, only the build host):

[ RUN ] GLibMainLoopTest.propagates_exception_from_signal_handler
profiling:/home/dan:Cannot create directory
profiling:/home/dan/bzr/mir/cov/build-android-arm/3rd_party/xcursor/CMakeFiles/xcursorloader.dir/xcursor.c.gcda:Skip
profiling:/home/dan:Cannot create directory
...

It's not just GLibMainLoopTest though. If I disable that, the same issue appears in other tests.

summary: - GLibMainLoopTest fails
+ Tests fail on armhf with CMAKE_BUILD_TYPE=Coverage
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: mir-ubuntu-vivid-armhf-ci fails consistently (broken gcov support)

Oh, of course. The failing job "mir-ubuntu-vivid-armhf-ci" is one we don't run on merge proposals to lp:mir. Only proposals to lp:mir/ubuntu

:P

summary: - Tests fail on armhf with CMAKE_BUILD_TYPE=Coverage
+ mir-ubuntu-vivid-armhf-ci fails consistently (broken gcov support)
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I'm not convinced that you're seeing the same problem:

Looking at the console output (https://jenkins.qa.ubuntu.com/job/mir-ubuntu-vivid-armhf-ci/14/consoleFull) it doesn't appear that the tests are being run on a different host to the build. (And it looks as though gcovr is installed and detected correctly.)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Maybe so, but it is suspicious that the technique in comment #4 makes the same tests fail.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Sure we've seen various problems with the toolchain integration on armhf - if dropping gcovr from this job works I don't think we need be too concerned short term.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It's also a little bit suspicious that we're hitting the Valgrind unhandled instruction problem at exactly the same time:

[ RUN ] GLibMainLoopTest.propagates_exception_from_signal_handler
==15334==
==15334== HEAP SUMMARY:
==15334== in use at exit: 32,289 bytes in 522 blocks
==15334== total heap usage: 31,911 allocs, 31,389 frees, 1,658,903 bytes allocated
==15334==
==15334== LEAK SUMMARY:
==15334== definitely lost: 0 bytes in 0 blocks
==15334== indirectly lost: 0 bytes in 0 blocks
==15334== possibly lost: 5,836 bytes in 159 blocks
==15334== still reachable: 26,453 bytes in 363 blocks
==15334== suppressed: 0 bytes in 0 blocks
==15334== Reachable blocks (those to which a pointer was found) are not shown.
==15334== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15334==
==15334== For counts of detected and suppressed errors, rerun with: -v
==15334== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 3)
unknown file: Failure
C++ exception with description "Timeout while waiting for child to change state" thrown in the test body.
disInstr(thumb): unhandled instruction: 0xDEFF 0xF893
[ FAILED ] GLibMainLoopTest.propagates_exception_from_signal_handler (5953 ms)

[https://jenkins.qa.ubuntu.com/job/mir-ubuntu-vivid-armhf-ci/21/consoleFull]

Unhandled instructions will of course (should) lead to test failures.

Changed in mir:
milestone: 0.10.0 → 0.11.0
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

It's also interesting that a bit before the GLibMainLoop test failures we get some memory errors in other tests. Perhaps the stack has been corrupted?

8: [ RUN ] GMock.return_by_move
8: ==10949== Invalid write of size 4
8: ==10949== at 0x4D7AC06: ??? (in /lib/arm-linux-gnueabihf/libgcc_s.so.1)
8: ==10949== Address 0xbdb8c740 is on thread 1's stack
8: ==10949== 16 bytes below stack pointer
8: ==10949==
8: [ OK ] GMock.return_by_move (431 ms)
8: [----------] 1 test from GMock (496 ms total)

8: [ RUN ] RecursiveReadWriteMutex.can_be_read_locked_on_multiple_threads
8: ==10949== Thread 2:
8: ==10949== Invalid write of size 4

Changed in mir:
assignee: nobody → Alexandros Frantzis (afrantzis)
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

> It's also interesting that a bit before the GLibMainLoop test failures we get some memory errors in other tests

Note I am referring to the latest instances of this bug:

http://jenkins.qa.ubuntu.com/job/mir-mediumtests-vivid-touch/980/console
http://jenkins.qa.ubuntu.com/job/mir-mediumtests-vivid-touch/978/console
http://jenkins.qa.ubuntu.com/job/mir-mediumtests-vivid-touch/977/console

Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

Another interesting data point is that both gcc and libglib were upgraded when the failure started to occur:

build 976 (the last build that succeeds): gcc 4.9.2-10ubuntu1 , libglib2.0 2.43.2-1ubuntu1
build 977 (the first build that fails): gcc 4.9.2-10ubuntu2 , libglib2.0 2.43.3-1

Changed in mir:
status: New → In Progress
Revision history for this message
Alexandros Frantzis (afrantzis) wrote :

I can reproduce the issue locally. It seems to be a problem of valgrind being very slow when dealing with forks. Increasing the timeout fixes this problem locally.

summary: - mir-ubuntu-vivid-armhf-ci fails consistently (broken gcov support)
+ mir-ubuntu-vivid-armhf-ci fails consistently
Changed in mir:
milestone: 0.11.0 → 0.12.0
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.12.0

Changed in mir:
status: In Progress → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Merged into lp:mir/0.11 at revision 2283.

Changed in mir:
milestone: 0.12.0 → 0.11.0
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package mir - 0.11.0+15.04.20150209.1-0ubuntu1

---------------
mir (0.11.0+15.04.20150209.1-0ubuntu1) vivid; urgency=medium

  [ Daniel van Vugt ]
  * New upstream release 0.11.0 (https://launchpad.net/mir/+milestone/0.11.0)
    - Enhancements:
      . Lots more major plumbing in the Android code, on the path to
        supporting external displays.
      . Add support for clang 3.6.
      . Major redesign of server classes in mir::shell,scene and friends
        (still in progress).
      . Added client API for creating dialogs and tooltips.
      . Added new surface states: mir_surface_state_hidden and
        mir_surface_state_horizmaximized.
      . Performance: Use optimally efficient fragment shading when possible.
      . Performance: (Desktop) Composite using double buffering instead of
        triple to reduce visible lag.
      . mir_proving_server: Can now resize windows from any edge or corner
        using the existing Alt+middlebuttondrag.
      . mir_proving_server: Added some demo custom shaders (negative and
        high contrast modes: Super+N/C).
      . mir_proving_server: Can now close clients politely via Alt+F4.
      . Added MirPointerInputEvent (part of the new input API, the old
        MirMotionEvent is still supported also for now).
    - ABI summary: Servers need rebuilding, but clients do not;
      . Mirclient ABI unchanged at 8
      . Mircommon ABI unchanged at 3
      . Mirplatform ABI bumped to 6
      . Mirserver ABI bumped to 29
    - Bug fixes:
      . [regression] mir_demo_server exits immediately with boost
        bad_any_cast exception (LP: #1414630)
      . need way to position menus and tooltips (relative positioning to
        parent) (LP: #1324101)
      . GLibMainLoopTest failure seen in CI (LP: #1413748)
      . Clang builds fail in CI (LP: #1416317)
      . segfault in mir::compositor::GLProgramFamily::Shader::init()
        (LP: #1416482)
      . GLRenderer: The default fragment shader is sub-optimal for alpha=1.0
        (LP: #1350674)
      . mesa::DisplayBuffer::post_update is triple buffered - more laggy than
        it needs to be (LP: #1350725)
      . Cannot connect to nested server when started from a differen vt
        (LP: #1379266)
      . [testfail] AsioMainLoopAlarmTest fails in CI (LP: #1392256)
      . Compositor report inconsistently reports frame time during bypass,
        and render time otherwise (LP: #1408906)
      . [regression] mir_demo_client_fingerpaint doesn't paint anything any
        more (with the mouse) (LP: #1413139)
      . Hardware cursor is always slightly ahead of the composited image
        (LP: #1274408)
      . integration tests are outputting (too many) DisplayServer log
        messages (LP: #1408231)
      . [regression] deploy-and-test.sh doesn't work any more (unless you
        have umockdev installed already) (LP: #1413479)
      . Color Inverse on display. Toggle Negative Image (LP: #1400580)
      . mir-ubuntu-vivid-armhf-ci fails consistently (LP: #1407863)
      . Double-buffered surfaces may lag or freeze if event driven and not
        constantly redrawing (LP: #1395581)
      . Pointer motion and crossing events...

Read more...

Changed in mir (Ubuntu):
status: New → Fix Released
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers