Unable to resume the test correctly

Bug #1393291 reported by Vanessa Chang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PlainBox (Toolkit)
Invalid
Critical
Unassigned

Bug Description

When resumed from the previous session, result selection dialog is not available.
Therefore the job is pending.

Reproduce steps:
1) install dev version: ./boxer install -p somerville -r daily
2) select BIOS whitelist to run
3) Check if the tests can be completed

Expected result:
All the tests can be completed

Actual result:
Sometimes when system resumes from the previous session, result selection dialog is not available.
Therefore the job is pending, and need to restart the session.

Revision history for this message
Vanessa Chang (vanessa-chang) wrote :
Revision history for this message
Vanessa Chang (vanessa-chang) wrote :

plainbox:

  Installed: 0.17+bzr3408+pkg19~ubuntu14.04.1
  Candidate: 0.17+bzr3408+pkg19~ubuntu14.04.1
  Version table:
 *** 0.17+bzr3408+pkg19~ubuntu14.04.1 0
        500 http://ppa.launchpad.net/checkbox-dev/ppa/ubuntu/ trusty/main amd64 Packages
        100 /var/lib/dpkg/status
     0.5.3-2 0
        500 http://archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages

Po-Hsu Lin (cypressyew)
Changed in plainbox:
assignee: nobody → Zygmunt Krynicki (zkrynicki)
Revision history for this message
Vanessa Chang (vanessa-chang) wrote :
Zygmunt Krynicki (zyga)
Changed in plainbox:
status: New → In Progress
importance: Undecided → Critical
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Looking at the attached log files, I see the following two problems:

plainbox.impl.session.resume.CorruptedSessionError: Unknown jobs remaining: 2013.com.canonical.certification::__audio__, 2013.com.canonical.certification::__bluetooth__, 2013.com.canonical.certification::__disk__, 2013.com.canonical.certification::__graphics__, 2013.com.canonical.certification::__info__, 2013.com.canonical.certification::__mediacard__, 2013.com.canonical.certification::__miscellanea__, 2013.com.canonical.certification::__monitor__, 2013.com.canonical.certification::__networking__, 2013.com.canonical.certification::__power-management__, 2013.com.canonical.certification::__stress__, 2013.com.canonical.certification::__suspend__, 2013.com.canonical.certification::__usb__, 2013.com.canonical.certification::__wireless__, 2014.com.canonical.ce::somerville

And:

2014-11-13 10:10:28 [pid:2692, thread:MainThread, reltime:300061ms] ERROR plainbox.session.resume: BUG in session resume logic / assumptions
2014-11-13 10:10:28 [pid:2692, thread:MainThread, reltime:300061ms] ERROR checkbox.ng.dbus_ex.decorators: DBus method call failed

This is a bug in the resume logic, as the comment says, there's a corresponding log for the dbus parts of the call being made. Shortly after, we see a dbus-specific bug:

  File "/usr/lib/python3/dist-packages/plainbox/impl/session/state.py", line 738, in trim_job_list
    self.on_job_removed(job)
  File "/usr/lib/python3/dist-packages/plainbox/impl/signal.py", line 106, in __call__
    self.fire(args, kwargs)
  File "/usr/lib/python3/dist-packages/plainbox/impl/signal.py", line 100, in fire
    listener(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/checkbox_ng/service.py", line 913, in _job_removed
    self.remove_managed_object(job_wrapper)
  File "/usr/lib/python3/dist-packages/checkbox_ng/dbus_ex/service.py", line 543, in remove_managed_object
    self.remove_managed_object_list([obj])
  File "/usr/lib/python3/dist-packages/checkbox_ng/dbus_ex/service.py", line 564, in remove_managed_object_list
    new.remove(obj)
ValueError: list.remove(x): x not in list

I don't think that other bug is relevant yet, and that it only happens when the session resume code misbehaves.

Changed in plainbox:
milestone: none → 0.17
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Using:
 - branch https://github.com/checkbox/checkbox/commits/fix-1393291
 - tarball: https://bugs.launchpad.net/plainbox/+bug/1393291/+attachment/4262128/+files/plainbox_resume_bug.tar.gz
   (save the session as bug-1393291 in ~/.cache/plainbox/sessions)

This error can now be reproduced each time with:
 - plainbox session show bug-1393291 --resume --flag rewrite-log-pathnames --flag ignore-job-checksums

Note that you will need any version of the somerville provider (the details don't matter).

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

With those things applied, the output of the 'plainbox session show' command above is: http://paste.ubuntu.com/9074906/

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Update, I had a broken copy of the OEM provider. With the correct copy, it doesn't crash for me.

My new working theory is that checkbox-gui, while resuming, artificially limits the set of available jobs, thus causing that problem. I will attempt to test that theory next.

Revision history for this message
Jerry Kao (jerry.kao) wrote :

Met this issue when test watauga4.0

Scenario
Launch checkbox with checkbox-cli
When run reboot/power off test and re-launch checkbox, no matter choose which of following options, session can not continue properly.
What do you want to do with that job?
  s => skip that job
  p => mark it as passed and continue
  f => mark it as failed and continue
  r => run it again

A critical error message show as following
CRITICAL plainbox.crashes: Executable 'checkbox-launcher' invoked with Namespace(command=<checkbox_ng.commands.launcher.LauncherCommand object at 0x321a650>, debug_console=False, debug_interrupt=False, dont_suppress_output=False, dry_run=False, launcher='/usr/bin/checkbox-cli', log_level=None, non_interactive=False, pdb=False, trace=[]) has crashed

detail log are tar and attached

Revision history for this message
Roxanne Fan (matrixf) wrote :

I encountered same issues when running somerville tests via checkbox-gui

----------------
checkbox:
  Installed: 0.18-0ubuntu2
  Candidate: 0.18-0ubuntu2
  Version table:
 *** 0.18-0ubuntu2 0
        500 http://tw.archive.ubuntu.com/ubuntu/ utopic/universe amd64 Packages
        100 /var/lib/dpkg/status

plainbox:
  Installed: 0.14~ppa~ubuntu14.04.1
  Candidate: 0.14~ppa~ubuntu14.04.1
  Version table:
 *** 0.14~ppa~ubuntu14.04.1 0
        100 /var/lib/dpkg/status
     0.5.4-1 0
        500 http://tw.archive.ubuntu.com/ubuntu/ utopic/universe amd64 Packages

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Jerry, I've reported another bug related to comment 8 here: https://bugs.launchpad.net/plainbox/+bug/1396532

Please track each suspend resume issue separately (new bug). It's easier to say one is a duplicate rather than trying to split bugs out of comments.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Roxanne: please report separate bugs the next time. I'm not sure this is a single issue that's causing this.

Revision history for this message
Vanessa Chang (vanessa-chang) wrote :

Unable to reproduce this issue on the same unit with current released plainbox version:

Test 1 run can pass

$ dpkg -l|egrep 'checkbox|plainbox|canonical'
ii canonical-oem-keyring 2009.07.23+build3 all GnuPG keys of Canoncical OEM archives
ii canonical-poke 0.4.1+94~ubuntu14.04.1 all send "I am alive" ping to Canonical
ii checkbox 0.17.9.1~ubuntu14.04.1 amd64 System testing application
ii checkbox-certification-tools 0.19~ubuntu14.04.1 all Checkbox Certification Tools
ii checkbox-gui 0.28~ppa~ubuntu14.04.1 amd64 QML based interface for system testing based on PlainBox.
ii checkbox-hw-collection 0.17.9.1~ubuntu14.04.1 amd64 CLI tool for collecting HW information from a system
ii checkbox-ng 0.14~ppa~ubuntu14.04.1 all PlainBox based test runner
ii checkbox-ng-service 0.14~ppa~ubuntu14.04.1 all CheckBox D-Bus service
ii plainbox 0.16~ppa~ubuntu14.04.1 all toolkit for software and hardware integration testing
ii plainbox-glmark2-es2-meta 0.11~ppa~ubuntu14.04.1 amd64 metapackage to selectively install glmark2-es2
ii plainbox-insecure-policy 0.16~ppa~ubuntu14.04.1 all policykit policy required to use plainbox (insecure version)
ii plainbox-provider-certification-client 0.11~ppa~ubuntu14.04.1 all Client Certification
ii plainbox-provider-checkbox 0.15~ppa2~ubuntu14.04.1 amd64 CheckBox provider for PlainBox
ii plainbox-provider-oem 0.1ubuntu36 all plainbox oem provider
ii plainbox-provider-oem-somerville 0.1ubuntu36 all plainbox oem provider for stella
ii plainbox-provider-resource-generic 0.13~ppa~ubuntu14.04.1 amd64 CheckBox generic resource jobs provider
ii python3-checkbox 0.17.9.1~ubuntu14.04.1 all CheckBox python3 library
ii python3-checkbox-ng 0.14~ppa~ubuntu14.04.1 all PlainBox based test runner (Python 3 library)
ii python3-checkbox-support 0.14~ppa~ubuntu14.04.1 all collection of Python modules used by PlainBox providers
ii python3-plainbox 0.16~ppa~ubuntu14.04.1 all toolkit for software and hardware testing (python3 module)

Revision history for this message
Zygmunt Krynicki (zyga) wrote : Re: [Bug 1393291] Re: Unable to resume the test correctly

I haven't really fixed anything. Whatever was causing that is still
possible. I have a few theories and I'll try to create scenarios where I
can reproduce this bug.

Thanks
If you *ever* encounter this again, don't touch anything, just leave the
box as is and let me connect to it.

Thanks
ZK

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm giving up on this bug. I cannot reproduce it at all. I'll do some extra analysis but I think this is it. If it happens again, please reopen this report.

Changed in plainbox:
status: In Progress → Incomplete
assignee: Zygmunt Krynicki (zkrynicki) → nobody
Zygmunt Krynicki (zyga)
Changed in plainbox:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.