TripleO job sometimes failing due to wrong timeout of 20 mins

Bug #1591102 reported by Steven Hardy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Paul Belanger

Bug Description

We're seeing jobs set the wrong timeout:

2016-06-10 06:39:35.756326 + OVERCLOUD_DEPLOY_TIMEOUT=20
2016-06-10 06:39:35.756366 + export 'OVERCLOUD_DEPLOY_ARGS=--libvirt-type=qemu -t 20'
2016-06-10 06:39:35.756403 + OVERCLOUD_DEPLOY_ARGS='--libvirt-type=qemu -t 20'
2016-06-10 06:39:35.756433 + export OVERCLOUD_UPDATE_ARGS=

It looks like DEVSTACK_GATE_TIMEOUT has changed, but only for the nonha job, I'm not sure why, but this ends up giving only 20mins before timeout (instead of the normal 80) for overcloud deployments.

https://github.com/openstack-infra/tripleo-ci/blob/4750ff460b41c9f3697cd63c9ee0c4ed2cf933e9/toci_gate_test.sh#L53

http://logs.openstack.org/11/326511/2/check-tripleo/gate-tripleo-ci-centos-7-nonha/e572c14/console.html#_2016-06-10_06_39_35_756326

Steven Hardy (shardy)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → ongoing
Revision history for this message
Steven Hardy (shardy) wrote :

So here:

http://logs.openstack.org/31/323431/6/check-tripleo/gate-tripleo-ci-centos-7-nonha/f0ae8e1/console.html#_2016-06-10_08_37_24_634345

we see:

2016-06-10 08:37:24.634345 + echo 'Job timeout set to: 93 minutes'

Which is where the 20mins is coming from.

But here:

http://logs.openstack.org/83/328183/1/check-tripleo/gate-tripleo-ci-centos-7-nonha/461e4a9/console.html#_2016-06-10_09_37_40_144

we see:

2016-06-10 09:37:40.144 | + echo 'Job timeout set to: 159 minutes'

Currently I don't understand why they are different.

Revision history for this message
Emilien Macchi (emilienm) wrote :

It's not only nonha jobs, it's happenning when our jobs are run on zuul slaves, we're missing BUILD_TIMEOUT that we had in Jenkins but not anymore in Zuul. Assigned to Paul, he's on it.

summary: - nonha job faiing due to wrong timeout of 20 mins
+ TripleO job sometimes failing due to wrong timeout of 20 mins
Changed in tripleo:
assignee: nobody → Paul Belanger (pabelanger)
Revision history for this message
Emilien Macchi (emilienm) wrote :

Attempt to fix it in Zuul: https://review.openstack.org/#/c/328298/

Revision history for this message
Ben Nemec (bnemec) wrote :

This seems to have been fixed on the infra side.

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.