CI timing out with no logs

Bug #1407132 reported by Ben Nemec
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Gregory Haynes

Bug Description

I've been seeing a lot of CI jobs timing out with no logs to help debug the problem. Opening this bug to have a place to track those issues (there are multiple similar but different symptoms).

One example:
2014-12-19 17:58:43.187 | Waiting for the overcloud stack to be ready
2014-12-19 17:58:43.187 | + wait_for_stack_ready 360 10 overcloud
2014-12-19 19:50:02.083 | 2014-12-19 19:50:02,060 - testenv-client - ERROR - The command hasn't completed but the testenv worker has released the environment. Killing all processes.
2014-12-19 19:50:02.129 | /opt/stack/new/tripleo-ci/toci_gate_test.sh: line 69: 11443 Killed ./testenv-client -b $GEARDSERVER:4730 -t $TIMEOUT_SECS -- ./toci_devtest.sh
2014-12-19 19:50:02.138 | ERROR: the main setup script run by this job failed - exit code: 137
2014-12-19 19:50:02.139 | please look at the relevant log files to determine the root cause

And no seed_logs collected to determine why the deployment didn't complete in a reasonable time: http://logs.openstack.org/65/137465/2/check-tripleo/check-tripleo-ironic-overcloud-f20-nonha/04c678a/logs/

Revision history for this message
Giulio Fidente (gfidente) wrote :

It does not seem to be related to the distro, I got it with the precise job check-tripleo-ironic-overcloud-precise-nonha

Changed in tripleo:
assignee: nobody → Gregory Haynes (greghaynes)
status: Triaged → In Progress
Changed in tripleo:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-incubator (master)

Reviewed: https://review.openstack.org/145566
Committed: https://git.openstack.org/cgit/openstack/tripleo-incubator/commit/?id=e5e46a69df3972e93b6ac56b6922d6259119730d
Submitter: Jenkins
Branch: master

commit e5e46a69df3972e93b6ac56b6922d6259119730d
Author: Gregory Haynes <email address hidden>
Date: Wed Jan 7 11:55:49 2015 -0800

    Make wait_for use getopt and add walltime support

    We currently use wait_for which does not account for time spent blocking
    during COMMAND. This leads to issues where it is hard to calculate time
    we want to spend waiting for something. Modifying wait_for and
    wait_for_stack_ready to support walltime based timeouts.

    Closes-Bug: #1407132

    Change-Id: Icdc626ef8075fbd2f9e7cb7c011a12351c815e09

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.