LAVA Dispatcher

qemu target may fail to sync data to the image

Bug #1073899 reported by Paul Sokolovsky on 2012-11-01

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	LAVA Dispatcher	Won't Fix	Undecided	Unassigned

Bug Description

Trying linaro_test_shell action with qemu target, it appears there's high probability that logs, results, etc. won't be available for pickup from underlying filesystem image after the test run. This appears to be because following happens in quick succession:

1. Log/results files are written in qemu guest.
2. "sync" is performed in qemu guest.
3. qemu is terminated.

So, it seems that step 2., sync, doesn't really flush changed to the fs image on a host immediately, and (abrupt?) termination of qemu (closing of pipe) doesn't flush either.

mem-stress-a7 : PASS
<LAVA_TEST_RUNNER>: 0_passfail exited with: 0
<LAVA_TEST_RUNNER>: exiting<LAVA_DISPATCHER>2012-11-01 12:51:53 PM INFO: lava_test_shell seems to have completed
<LAVA_DISPATCHER>2012-11-01 12:51:53 PM INFO: attempting a filesystem sync before power_off

sync
sync
linaro-test [rc=0]# & 0_passfail-1351770706
&& 0_passfail-1351770706
<LAVA_DISPATCHER>2012-11-01 12:51:57 PM ERROR: error processing results for: 0_passfail-1351770706
Traceback (most recent call last):
  File "/home/pfalcon/devel/linaro/lava/lava-dispatcher/lava_dispatcher/lava_test_shell.py", line 208, in get_bundle
    testruns.append(_get_test_run(results_dir, d, hwctx, swctx))
  File "/home/pfalcon/devel/linaro/lava/lava-dispatcher/lava_dispatcher/lava_test_shell.py", line 164, in _get_test_run
    testdef = json.loads(testdef)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
{'test_runs': [], 'format': 'Dashboard Bundle Format 1.3'}
<LAVA_DISPATCHER>2012-11-01 12:51:57 PM INFO: [ACTION-E] lava_test_shell is finished successfully.

Related branches

lp:~mwhudson/lava-dispatcher/wait-for-console-for-lava_test_shell-bug-1073899

Rejected for merging into lp:lava-dispatcher

Paul Sokolovsky: Approve on 2012-11-08

Dave Pigott: Approve on 2012-11-06

Revision history for this message

Paul Sokolovsky (pfalcon) wrote on 2012-11-01:

#1

Doing sleep(3) before closing qemu pipe works around this issue for me, though it's clear a workaround, and there can be cases when 3s sleep would be not enough.

Revision history for this message

Paul Sokolovsky (pfalcon) wrote on 2012-11-01:

#2

qemu support in lava-dispatcher uses "-drive if=%s,cache=writeback". I though the problem may be due to cache=writeback and tried cache=writethrough, cache=none, but the problem persists. (Maybe nano image 12.03 doesn't implement sync properly?)

Revision history for this message

Paul Sokolovsky (pfalcon) wrote on 2012-11-01:

#3

For completeness I also tried cache=directsync, with the same result.

So, I'm going to prepare patch for sleep() workaround.

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2012-11-01: Re: [Bug 1073899] Re: qemu target may fail to sync data to the image

#4

Paul Sokolovsky <email address hidden> writes:

> For completeness I also tried cache=directsync, with the same result.
>
> So, I'm going to prepare patch for sleep() workaround.

We recently added a 'halt' to fastmodel.py to work around this same
issue. Maybe that will work for qemu.py too? Although sleep will
probably work just as well.

Cheers,
mwh

Revision history for this message

Andy Doan (doanac) wrote on 2012-11-01:

#5

The real issue, at least when I've seen it it is this:

The lava-test-runner gets launched before the root shell prompt is active. The test it runs is really fast, so we get ready to exit before there's a console. Then the qemu.py code calls sync/halt, but they don't get executed because console isn't ready to act on them yet. We then shutdown and the changes get lost.

We need to change the upstart entry for this job to wait on the root console to be ready.

Revision history for this message

Antonio Terceiro (terceiro) wrote on 2012-11-01:

#6

Is there a reason why we start those tests from an upstart job instead
of waiting for the shell prompt and then launch the test from there?

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2012-11-01:

#7

Antonio Terceiro <email address hidden> writes:

> Is there a reason why we start those tests from an upstart job instead
> of waiting for the shell prompt and then launch the test from there?

It felt like it would be more reliable and depend less on serial. But
it turns out we send commands to the prompt anyway and so depend on
serial...

Cheers,
mwh

Revision history for this message

Andy Doan (doanac) wrote on 2012-11-02:

#8

still depending on serial for attempting to shut down cleanly is better than depending on it to do anything at all.

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2012-11-05:

#9

So this is just as simple as the linked branch, right?

Revision history for this message

Paul Sokolovsky (pfalcon) wrote on 2012-11-08:

#10

Confirming the linked branch fixes the issue for me.

Changed in lava-dispatcher:
status:	New → In Progress

Revision history for this message

Alan Bennett (akbennett) wrote on 2013-06-11:

#11

Fixing this bug does not fit in our development plans. Moving forward, all LAVA bugs will be disposed/scrubbed/triaged weekly.

Changed in lava-dispatcher:
status:	In Progress → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.