Random mir failures running unity8 shell during AP tests [Mir throws exception: what(): error during hwc set()]

Bug #1262982 reported by Martin Pitt
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mir
Triaged
High
Unassigned
unity-mir
Confirmed
High
Unassigned
qtubuntu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

In autopilot tests, sometimes clients fail to start (crash) causing the test to fail. Considering an example test output:
https://jenkins.qa.ubuntu.com/job/generic-mediumtests-runner-mako/4469/
the unity8 log file:
https://jenkins.qa.ubuntu.com/job/generic-mediumtests-runner-mako/4469/artifact/results/log/unity8.log
does not contain a "REJECT" string - which would indicate unity-mir/unity8 rejects the client connection. So the connection is failing for another reason.

However the log file indicates there are many unity8/mir crashes, eventually breaking things to be unrecoverable. The bad crashes appears to be:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::runtime_error> >'
  what(): error during hwc set()

and then on next invocation:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::logic_error> >'
  what(): compositor_acquire would block; probably too many clients.

From here on, Mir fails to restart at all. Then all AP test fail, as clients (rightly) fail to start (they shouldn't crash though, a nice error message would be better).

Note that AP reboots the device after each test suite, hence you see multiple invocations of unity8/mir in the unity8.log file - that's normal.

We need a more reliable way to reproduce this problem. Working on it (gerry)...

============================ old description of bug =====================================
This is the current crash of the dialer-app tests from http://ci.ubuntu.com/smokeng/trusty/touch/mako/76:20131219:20131218.2/5557/dialer-app-autopilot/

ProblemType: Crash
DistroRelease: Ubuntu 14.04
Package: dialer-app 0.1+14.04.20131209-0ubuntu1
Uname: Linux 3.4.0-3-mako armv7l
Architecture: armhf
CurrentDesktop: Unity
Date: Thu Dec 19 05:33:28 2013
ExecutablePath: /usr/bin/dialer-app
ExecutableTimestamp: 1386563099
ProcCmdline: dialer-app
ProcCwd: /home/phablet
Signal: 6
SourcePackage: dialer-app
StacktraceTop:
 ?? () from /lib/arm-linux-gnueabihf/libc.so.6
 raise () from /lib/arm-linux-gnueabihf/libc.so.6
 abort () from /lib/arm-linux-gnueabihf/libc.so.6
 QMessageLogger::fatal(char const*, ...) const () from /usr/lib/arm-linux-gnueabihf/libQt5Core.so.5
 ?? () from /usr/lib/arm-linux-gnueabihf/qt5/plugins/platforms/libqubuntumirclient.so
UserGroups: adm autopilot cdrom dialout dip nopasswdlogin plugdev sudo tty video

Revision history for this message
Martin Pitt (pitti) wrote :
information type: Private → Public
Revision history for this message
Apport retracing service (apport) wrote :

StacktraceTop:
 __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:44
 raise () from /tmp/apport_sandbox_h37Ynb/lib/arm-linux-gnueabihf/libc.so.6
 abort () from /tmp/apport_sandbox_h37Ynb/lib/arm-linux-gnueabihf/libc.so.6
 QMessageLogger::fatal(char const*, ...) const () at global/qlogging.cpp:868
 QUbuntuIntegration::QUbuntuIntegration(QUbuntuInputAdaptorFactory*) () at integration.cc:75

Revision history for this message
Apport retracing service (apport) wrote : Stacktrace.txt
Revision history for this message
Apport retracing service (apport) wrote : StacktraceSource.txt
Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt
Changed in dialer-app (Ubuntu):
importance: Undecided → Medium
summary: - dialer-app crashed with SIGABRT in raise()
+ dialer-app crashed with SIGABRT in __libc_do_syscall()
tags: removed: need-armhf-retrace
Revision history for this message
Martin Pitt (pitti) wrote : Re: dialer-app crashed with SIGABRT in __libc_do_syscall()

This seems to come from qtubuntu src/platforms/ubuntu/ubuntucommon/integration.cc line 75:

  if (instance_ == NULL)
    qFatal("QUbuntu: Could not create application instance");

affects: dialer-app (Ubuntu) → qtubuntu (Ubuntu)
Changed in qtubuntu (Ubuntu):
importance: Medium → Undecided
summary: - dialer-app crashed with SIGABRT in __libc_do_syscall()
+ dialer-app crashed: QUbuntu: Could not create application instance
Revision history for this message
Cris Dywan (kalikiana) wrote : Re: dialer-app crashed: QUbuntu: Could not create application instance

I think this is the same as bug 1243665 which affects rss and ubuntu ui toolkit tests. According to that there's a race condition causing this error message.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qtubuntu (Ubuntu):
status: New → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

But apparently bug 1243665 already was fixed, so perhaps it wasn't fixed for all cases, or this is similar, but not identical?

Revision history for this message
Tim Peeters (tpeeters) wrote :

I'm getting something similar here: http://jenkins.qa.ubuntu.com/job/generic-mediumtests-runner-mako/4289/console

as part of this MR: https://code.launchpad.net/~tpeeters/ubuntu-ui-toolkit/headerHeightInit/+merge/199468

I did an empty commit to see what happens when CI runs again.

Gerry Boland (gerboland)
Changed in qtubuntu (Ubuntu):
status: Confirmed → Invalid
Changed in unity-mir:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Cris Dywan (kalikiana) wrote :

To reproduce:
autopilot run -v ubuntuuitoolkit

This should be all it takes, I saw it fail on different mrs with unrelated changes. I can't repro on desktop or maguro.

Revision history for this message
Gerry Boland (gerboland) wrote :

This usually happens when unity8/unity-mir rejects the incoming application connection request. It does this when not expecting the app startup, or if the app does not supply a correct and existing --desktop_file_hint parameter to the application.

As initial debugging, can you please check that:
1. the application binary is launched with the argument --desktop_file_hint=/path/to/desktop-file.desktop
2. that by the XDG spec, the only accepted places for desktop files are /usr/share/applications and $HOME/.local/share/applications. A custom path to an existing desktop file will not be accepted
3. that the desktop file actually exists

Future warning: we will soon need that the environment variable APP_ID should be set to the app's app id when the application is launched.

The fact that autopilot is not using upstart-app-launch to launch apps is highly inconvenient, as upstart-app-launch is designed to do the right thing. --desktop_file_hint is not a long term solution.

Revision history for this message
Gerry Boland (gerboland) wrote :
Revision history for this message
Tim Peeters (tpeeters) wrote :

I executed the same tests locally on maguro, no failures: https://pastebin.canonical.com/102317/

Revision history for this message
Martin Pitt (pitti) wrote :

Gerry, so is this exclusively a problem with running tests with the --desktop-file-hint hack? Then we can mark this as a dupe of bug1263182. Or Is there still a Mir issue independent of that? Thanks!

Revision history for this message
Gerry Boland (gerboland) wrote :

@pitti - answer is yes. If AP used upstart, this issue will go away. --desktop_file_hint is not a permanent solution, but if there's a problem with it, I'd like to fix it

Revision history for this message
Gerry Boland (gerboland) wrote :

So to determine if the client crashes due to the --desktop_file_hint causing the application to be rejected, you should see a "REJECTED" line in the unity8 log file. The application rejection causes the Mir connection to be denied, and app crashes with this error.

However digging into this instance of the fail:
https://jenkins.qa.ubuntu.com/job/generic-mediumtests-runner-mako/4469/
the unity8 log file:
https://jenkins.qa.ubuntu.com/job/generic-mediumtests-runner-mako/4469/artifact/results/log/unity8.log
does not contain this string.

However there are many unity8/mir crashes, eventually breaking things to be unrecoverable. The bad crash appears to be:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::logic_error> >'
  what(): compositor_acquire would block; probably too many clients.

Will need Mir team input to investigate this further.

Gerry Boland (gerboland)
summary: - dialer-app crashed: QUbuntu: Could not create application instance
+ Random mir failures running unity8 shell during AP tests
Gerry Boland (gerboland)
description: updated
tags: added: ci-engineering
Revision history for this message
Gerry Boland (gerboland) wrote : Re: Random mir failures running unity8 shell during AP tests
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The primary error appears to still be:
    what(): error during hwc set()
which I'm not familiar with, other than it's an Android-specific exception from Mir.

What I am familiar with is my own:
    what(): compositor_acquire would block; probably too many clients.
However that situation is impossible to reach if your buffers are acquired/released correctly. Is Unity8 doing any buffer acquisition other than snapshots?

summary: - Random mir failures running unity8 shell during AP tests
+ Random mir failures running unity8 shell during AP tests [Mir throws
+ exception: what(): error during hwc set()]
Changed in mir:
importance: Undecided → High
status: New → Triaged
Revision history for this message
kevin gunn (kgunn72) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Well it doesn't seem to have reoccurred, so sure, call it a duplicate of bug 1240400.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.