Rerun of last test after resume causes test run to end

Bug #1410501 reported by Chris Gregan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Fix Released
Critical
Sylvain Pineau

Bug Description

Version: 1.14.2
Env: Trusty

Description:
After resuming from a shutdown, and selecting "rerun last test" the test run is ended and user is prompted to select what type of report they would like. Report shows results only up to the last test run before resume. Completed progress bar in UI shows "NaN%".

Steps to Reproduce:
1) Select Audio, Ethernet, Graphics, Firmware, and Wireless test suites
2) Begin testing
3) Once Ethernet tests begin power off the system
4) Power back on and re-launch CDTS
5) Select Rerun previous test

Expected result:
Previous test is run and suite continues where it left off.

Related branches

Chris Gregan (cgregan)
information type: Proprietary → Public
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

Tested in a fresh VM with 14.04.1 and CDTS installed from the 0.14.2 tarball.
I powered off the VM when running ethernet/maximum-bandwidth_eth0
I resumed my session and selected rerun the last test and it worked for me.

I would be interested if you're able to reproduce it to get the terminal output where you started canonical-driver-test-suite.
Especially the following sequence just after clicking on rerun last test:

ConnectJobReceivers
GuiEngine::ConnectJobReceivers - Done
computed next job 65
Running Job (RunJobs) "2013.com.canonical.certification::ethernet/maximum_bandwidth_eth0"
GuiEngine::SetSessionStateMetadata()
   "/plainbox/SessionStateLegacyAPICompatImpl/140205062246696"
flags : "incomplete"
running_job_name: "/plainbox/job/d424929f96daa71008ebf6a74b8ea774767359932372587574992a109419e89e"
title : "GuiEngine"
app_id : "com.canonical.checkbox-gui"
GuiEngine::RunJobs - Done
GuiEngine::CatchallShowInteractiveUISignalsHandler
GuiEngine::CatchallShowInteractiveUISignalsHandler - Done

Changed in cdts:
status: Confirmed → Incomplete
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :
Revision history for this message
Chris Gregan (cgregan) wrote :

dpkg -l | egrep 'plainbox|checkbox'

pastebin.ubuntu.com/9761965/

Revision history for this message
Chris Gregan (cgregan) wrote :

.cache/plainbox dir for analysis

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

Thanks Chris, there is this error reported by plainbox on resume:

2015-01-13 15:15:18 [pid:2094, thread:MainThread, reltime:159980ms] ERROR checkbox.ng.dbus_ex.decorators: DBus method call failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/checkbox_ng/dbus_ex/decorators.py", line 224, in sanity
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/checkbox_ng/service.py", line 973, in Resume
    self.native.resume()
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/legacy.py", line 149, in resume
    self._commit_manager()
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/legacy.py", line 202, in _commit_manager
    self._commit_resume()
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/legacy.py", line 226, in _commit_resume
    self.job_list, last_storage, lambda session: self)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/manager.py", line 338, in load_session
    ).resume(data, early_cb)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/resume.py", line 231, in resume
    return self._resume_json(json_repr, early_cb)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/resume.py", line 260, in _resume_json
    return helper.resume_json(json_repr, early_cb)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/resume.py", line 491, in resume_json
    return self._build_SessionState(session_repr, early_cb)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/resume.py", line 517, in _build_SessionState
    self._restore_SessionState_jobs_and_results(session, session_repr)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/resume.py", line 553, in _restore_SessionState_jobs_and_results
    self._process_job(session, jobs_repr, results_repr, job_id)
  File "/usr/lib/python3/dist-packages/plainbox/impl/session/resume.py", line 621, in _process_job
    _("Definition of job {!r} has changed").format(job_id))
plainbox.impl.session.resume.IncompatibleJobError: Definition of job '2013.com.canonical.certification::suspend/iperf_before_suspend_wifi_auto_wlan0' has changed

Changed in cdts:
status: Incomplete → New
status: New → In Progress
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

zyga: spineau: well, I think we should fix checkbox-ng and look at all the other parts that load a session to see if they are handling errors correctly
zyga: spineau: one other possibility
zyga: spineau: is that the resumed job is local but (?) somehow it has been re-started rather than loaded from the log file
zyga: spineau: and a remote possibility that we've changed something and there's a subtle change in the job checksum
zyga: spineau: but that's rather strange as we have lots of tests for that
zyga: spineau: so it's probably something else
zyga: spineau: hmm
spineau: zyga: iperf_before_suspend_wifi_auto_wlan0 is generated for sure
zyga: spineau: yep
spineau: by a local job
zyga: spineau: but it's generated from a log file
zyga: spineau: (on resume)
zyga: spineau: unless checkbox-gui actually runs local jobs before resuming stuff
zyga: spineau: in any case, for debugging, I think we could store a copy of every job
zyga: spineau: keyed from the checksum of that job
zyga: spineau: so that the recovery path of a case like this can give an informed opinion
zyga: spineau: like a diff of the canonical form of both jobs
zyga: spineau: it should be simple to add and would help us along the way towards making compatible job upgrades possible

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

2015-01-13 15:15:18 [pid:2094, thread:MainThread, reltime:159960ms] WARNING plainbox.ctrl: Local job 2013.com.canonical.certification::suspend/iperf_before_suspend_wifi_auto produced job 2013.com.canonical.certification::suspend/iperf_before_suspend_wifi_auto_wlan0 that collides with an existing job 2013.com.canonical.certification::suspend/iperf_before_suspend_wifi_auto_wlan0 (from 2013.com.canonical.certification::suspend/iperf_after_suspend_wifi_auto:1-9), the new job was discarded

between checkbox-gui to plainbox, someone is running a local job. I'll check which level is responsibleof that. ($1 for checkbox-gui :)

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

That last comment is misleading. Plainbox does the 'side effect' of running a local (or any other kind) of job on resume. What happens here, somehow, not sure how, is that the job seems to be different than the one we are starting with. This is perfectly possible when a definition of the job changes after the session was created.

What I'm interested in here, since we have both the before and after data right in this session, is to see the dump of both jobs.

Insert a "import pdb; pdb.set_trace()" line just after that line and inspect the generated job and the old job (with the same ID) in the session.

Changed in plainbox-provider-checkbox:
assignee: nobody → Sylvain Pineau (sylvain-pineau)
importance: Undecided → Critical
status: New → In Progress
milestone: none → 0.18
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

I don't know how we could prevent such errors but two jobs generated from two different local jobs had the same job id (a copy paste error).

Look at the guilty local job and notice the "before" in the generated job definition where we should read "after":

plugin: local
id: suspend/iperf_after_suspend_wifi_auto
estimated_duration: 30.0
requires:
 device.category == 'WIRELESS'
command:
 cat <<'EOF' | run_templates -s 'udev_resource | filter_templates -w "category=WIRELESS" | awk "/path: / { print \$2 }" | xargs -n 1 sh -c "for i in \`ls /sys\$0/net 2>/dev/null\`; do echo \$0 \$i; done"'
 plugin: shell
 id: suspend/iperf_before_suspend_wifi_auto_$2
 depends: suspend/suspend_advanced
 requires: device.path == "$1"
 user: root
 environ: TEST_TARGET_FTP TEST_TARGET_IPERF TEST_USER TEST_PASS
 command: network test -i $2 -t iperf
 _description:
  This test executes iperf connection performance/stability against all the ethernet devices found on the system before suspend.
 EOF
_description:
 This is an automated test to gather some info on the current state of your network devices. If no devices are found, the test will exit with an error.

Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :

Nothing to be done to fix ihv-wireless.whitelist.

Changed in plainbox-provider-checkbox:
status: In Progress → Fix Committed
Changed in cdts:
status: In Progress → Fix Committed
Chris Gregan (cgregan)
Changed in cdts:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in plainbox-provider-checkbox:
status: Fix Committed → Fix Released
Revision history for this message
Chris Gregan (cgregan) wrote :
Changed in cdts:
status: Fix Committed → Confirmed
Revision history for this message
Sylvain Pineau (sylvain-pineau) wrote :
Revision history for this message
Chris Gregan (cgregan) wrote :

Ah...we need to up the version then. Let me recheck using 1.14.3

Chris Gregan (cgregan)
Changed in cdts:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.