lava-test-shell crashes when commands time out and aborts the entire job

Bug #1130814 reported by vishal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Dispatcher
Fix Released
Medium
Nicholas Schutt

Related branches

vishal (vishalbhoj)
description: updated
Revision history for this message
Antonio Terceiro (terceiro) wrote :

Hi Vishal

lava-test-shell timed out because the boot failed: the shell prompt never appeared. This is much probably a problem with the build.

When the boot fails, lava-test-shell finishes with a critical error, aborting the job. The next action wouldn't be able to run anyway.

Changed in lava-dispatcher:
status: New → Invalid
Revision history for this message
vishal (vishalbhoj) wrote :

Hi Antonio,

The booting of the image does complete successfully. Here is the result bundle:
https://validation.linaro.org/lava-server/dashboard/streams/private/team/linaro/android-daily/bundles/e14c21059628131a3f3f4be1924e6f76cd3fde87/eaebc2de-7ac9-11e2-a9d7-fa163e618683/

It does reach to a shell prompt and then starts executing lava-test-runner. Here is the section from the logs:
"root@linaro# /data/lava/bin/lava-test-runner"

The test execution in lava-test-shell may timeout due to buggy test itself and marking this as a critical error and aborting the job doesn't seem to be the right thing to do.

Changed in lava-dispatcher:
status: Invalid → Confirmed
importance: Undecided → Medium
assignee: nobody → Antonio Terceiro (terceiro)
Revision history for this message
vishal (vishalbhoj) wrote :

Here is some relevant discussion from IRC. There are few other cases that needs to be handled as part of cleanup:

<bhoj> terceiro, I wanted to understand how the execution is handled. I am aware that the device is booted between every lava-test-shell call.
<bhoj> How are things structured when we have a mix of lava-test-shell and lava-android-test ?
<terceiro> bhoj: re your last comment - you are right - I misread the log
<terceiro> bhoj: bhoj after lava-test-shell finishes, it leaves the board alone, so the lava-android-test action should work fine
<terceiro> that is indeed some bug that we have to investigate
<bhoj> terceiro, okay.
<terceiro> I think because of the timeout, some result file that was assumed to exist ended up not being created
<bhoj> terceiro, okay. May be as part of cleanup at timeout the running process needs to be killed so that it doesn't affect the steps which follow.
<terceiro> bhoj: yeah

summary: - lava-android-test doesn't run when lava-test-shell fails
+ lava-test-shell crashes when commands time out and abort the entire job
summary: - lava-test-shell crashes when commands time out and abort the entire job
+ lava-test-shell crashes when commands time out and aborts the entire job
Changed in lava-dispatcher:
assignee: Antonio Terceiro (terceiro) → Nicholas Schutt (nick-schutt)
Changed in lava-dispatcher:
status: Confirmed → In Progress
milestone: none → 2013.02
Revision history for this message
Nicholas Schutt (nick-schutt) wrote :

Added try/except signaling to lava_test_shell.py and job.py to catch a LavaTestShellTimeout, report an error for the lava_test_shell step, and continue.

Proposed for merge:

  lp:~nick-schutt/lava-dispatcher/nicks-1130814

Changed in lava-dispatcher:
milestone: 2013.02 → 2013.03
Changed in lava-dispatcher:
status: In Progress → Fix Committed
Fathi Boudra (fboudra)
Changed in lava-dispatcher:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.