many panda health jobs timing out on booting android image

Bug #955052 reported by Paul Larson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LAVA Scheduler (deprecated)
Invalid
High
Paul Larson

Bug Description

The health job for a lot of the android images is timing out while trying to boot the android image. I'm seeing several of them with a lot of boot messages like:
[ 1192.723419] (stk) :ldisc_install = 0
[ 1193.987365] (stk) :ldisc_install = 1
[ 1195.074768] (stk) :line disc installation timed out
[ 1195.091644] (stk) :ldisc_install = 0
[ 1196.336303] (stk) :ldisc_install = 1
[ 1197.434295] (stk) :line disc installation timed out
(see http://validation.linaro.org/lava-server/scheduler/job/15043 as an example)

This usually indicates that the binary drivers need to be installed.

Revision history for this message
Paul Larson (pwlars) wrote :

Added
{
            "command": "android_install_binaries"
        },
We'll see if this corrects the problem.. trying now.

Changed in lava-scheduler:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Paul Larson (pwlars)
milestone: none → 2012.03
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 955052] [NEW] many panda health jobs timing out on booting android image

On Wed, 14 Mar 2012 13:44:02 -0000, Paul Larson <email address hidden> wrote:
> Public bug reported:
>
> The health job for a lot of the android images is timing out while trying to boot the android image. I'm seeing several of them with a lot of boot messages like:
> [ 1192.723419] (stk) :ldisc_install = 0
> [ 1193.987365] (stk) :ldisc_install = 1
> [ 1195.074768] (stk) :line disc installation timed out
> [ 1195.091644] (stk) :ldisc_install = 0
> [ 1196.336303] (stk) :ldisc_install = 1
> [ 1197.434295] (stk) :line disc installation timed out
> (see http://validation.linaro.org/lava-server/scheduler/job/15043 as an example)

This happens all the time for me, including cases where the boot was
deemed successful
(e.g. http://validation.linaro.org/lava-server/scheduler/job/14988/log_file,
which is a successful health check on the same board).

Cheers,
mwh

Revision history for this message
Yongqin Liu (liuyq0307) wrote :

for job "http://validation.linaro.org/lava-server/scheduler/job/15043", I'd like to think it as a network problem
because at last, the submit_job action failed too.

File "/srv/lava/instances/production/lib/python2.7/site-packages/lava_dispatcher/actions/launch_control.py", line 176, in submit_bundle
    result = dashboard.put_ex(json_bundle, job_name, stream)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1575, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1292, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1439, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 811, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 773, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 754, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known

Fathi Boudra (fboudra)
Changed in lava-scheduler:
milestone: 2012.03 → 2012.04
Fathi Boudra (fboudra)
Changed in lava-scheduler:
milestone: 2012.04 → 2012.05
Revision history for this message
Fathi Boudra (fboudra) wrote :

Hi Paul, what's the status? It's in progress since 3 months.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I'm not sure this report serves any useful purpose any more. IIRC we fixed the stk messages either by tweaking some sysctl variable, or just by upgrading the master image -- it doesn't happen any more, in any case.

Also, the health jobs on panda are working well currently.

Changed in lava-scheduler:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.