Bug #1674770 “Update timeout too long in CI” : Bugs : tripleo

Emilien Macchi (emilienm) on 2017-03-22

Changed in tripleo:
milestone:	none → pike-1
tags:	added: alert

Revision history for this message

Steven Hardy (shardy) wrote on 2017-03-22:

#1

AFAICT the timeout is repected directly via heat:

(undercloud) [stack@undercloud ~]$ heat stack-create test -f hosts-config.yaml -e hosts_env.yaml -t 333

(undercloud) [stack@undercloud ~]$ heat stack-show test | grep timeout
WARNING (shell) "heat stack-show" is deprecated, please use "openstack stack show" instead
| timeout_mins | 333

And also via tripleoclient:

openstack overcloud deploy --templates --timeout 333

| timeout_mins | 333

Could this be an issue specific to CI, e.g we've messed up the deploy arguments?

Revision history for this message

Michele Baldessari (michele) wrote on 2017-03-22:

#2

So we definitely have -t 80 in the deploy command:
http://logs.openstack.org/83/445883/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-updates/05ee057/console.html#_2017-03-21_10_24_39_793081

2017-03-21 10:24:39.793081 | tripleo.sh -- Deploy command arguments: --libvirt-type=qemu -t 80 -e /usr/share/openstack-tripleo-heat-templates/environments/debug.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml -e /opt/stack/new/tripleo-ci/test-environments/ipv6-network-templates/network-environment.yaml -e /opt/stack/new/tripleo-ci/test-environments/net-iso.yaml -e /opt/stack/new/tripleo-ci/test-environments/enable-tls-ipv6.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml -e /opt/stack/new/tripleo-ci/test-environments/inject-trust-anchor-hiera-ipv6.yaml --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /opt/stack/new/tripleo-ci/test-environments/worker-config.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml --templates --validation-warnings-fatal

For the update we call:
/opt/stack/new/tripleo-ci/scripts/tripleo.sh --overcloud-update

And it seems to me we do have -t 80 there as well?
2017-03-21 09:54:41.637335 | +++(/opt/stack/new/tripleo-ci/deploy.env:31): OVERCLOUD_UPDATE_ARGS='-e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
2017-03-21 09:54:41.637459 | --libvirt-type=qemu -t 80 -e /usr/share/openstack-tripleo-heat-templates/environments/debug.yaml
2017-03-21 09:54:41.637566 | -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml
2017-03-21 09:54:41.637675 | -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml
2017-03-21 09:54:41.637778 | -e /opt/stack/new/tripleo-ci/test-environments/ipv6-network-templates/network-environment.yaml
2017-03-21 09:54:41.637847 | -e /opt/stack/new/tripleo-ci/test-environments/net-iso.yaml
2017-03-21 09:54:41.637902 | -e /opt/stack/new/tripleo-ci/test-environments/enable-tls-ipv6.yaml
2017-03-21 09:54:41.637970 | -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml
2017-03-21 09:54:41.638029 | -e /opt/stack/new/tripleo-ci/test-environments/inject-trust-anchor-hiera-ipv6.yaml
2017-03-21 09:54:41.638063 | --ceph-storage-scale 1
2017-03-21 09:54:41.638119 | -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml
2017-03-21 09:54:41.638228 | -e /opt/stack/new/tripleo-ci/test-environments/worker-config.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml'

So we definitely have -t 80 in the deploy command:
http://logs.openstack.org/83/445883/3/check-tripleo/gate-tripleo-ci-centos-7-ovb-updates/05ee057/console.html#_2017-03-21_10_24_39_793081

2017-03-21 10:24:39.793081 | tripleo.sh -- Deploy command arguments: --libvirt-type=qemu -t 80 -e /usr/share/openstack-tripleo-heat-templates/environments/debug.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml -e /opt/stack/new/tripleo-ci/test-environments/ipv6-network-templates/network-environment.yaml -e /opt/stack/new/tripleo-ci/test-environments/net-iso.yaml -e /opt/stack/new/tripleo-ci/test-environments/enable-tls-ipv6.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml -e /opt/stack/new/tripleo-ci/test-environments/inject-trust-anchor-hiera-ipv6.yaml --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /opt/stack/new/tripleo-ci/test-environments/worker-config.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml --templates --validation-warnings-fatal

For the update we call:
 /opt/stack/new/tripleo-ci/scripts/tripleo.sh --overcloud-update

And it seems to me we do have -t 80 there as well?
2017-03-21 09:54:41.637335 | +++(/opt/stack/new/tripleo-ci/deploy.env:31): OVERCLOUD_UPDATE_ARGS='-e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml 
2017-03-21 09:54:41.637459 |                      --libvirt-type=qemu -t 80 -e /usr/share/openstack-tripleo-heat-templates/environments/debug.yaml
2017-03-21 09:54:41.637566 |                     -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml
2017-03-21 09:54:41.637675 |                     -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml
2017-03-21 09:54:41.637778 |                     -e /opt/stack/new/tripleo-ci/test-environments/ipv6-network-templates/network-environment.yaml
2017-03-21 09:54:41.637847 |                     -e /opt/stack/new/tripleo-ci/test-environments/net-iso.yaml
2017-03-21 09:54:41.637902 |                     -e /opt/stack/new/tripleo-ci/test-environments/enable-tls-ipv6.yaml
2017-03-21 09:54:41.637970 |                     -e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml
2017-03-21 09:54:41.638029 |                     -e /opt/stack/new/tripleo-ci/test-environments/inject-trust-anchor-hiera-ipv6.yaml
2017-03-21 09:54:41.638063 |                     --ceph-storage-scale 1
2017-03-21 09:54:41.638119 |                     -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml
2017-03-21 09:54:41.638228 |                  -e /opt/stack/new/tripleo-ci/test-environments/worker-config.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml'

Revision history for this message

Thomas Herve (therve) wrote on 2017-03-22:

#3

I think it's worth noting that the timeout value is per operation, not per job. So the 80 minutes are for create or update. Looking at your logs, update starts at 11:23, so 80 minutes bring it around 12:43. But the overall job timeout kicks in before, at 12:18.

On successful jobs, it seems the update takes about 30 minutes, so maybe it should be less than 80 by default to kick in?

Revision history for this message

Ben Nemec (bnemec) wrote on 2017-03-22:

#4

Oh, crud. I totally missed that the create completed and moved on to the update in this job. I think you're right that we need a shorter timeout for update.

Looking at the graphite metrics for the update job it looks like the average is around 40 minutes when the cloud is heavily loaded. We'd probably need to go at least 45 to account for normal runtime variations.

summary:

- Timeout passed to overcloud deploy not effective
+ Update timeout too long in CI

Revision history for this message

Ben Nemec (bnemec) wrote on 2017-03-22:

#5

Okay, maybe the problem was that I linked the wrong log. I just checked another job and it did indeed fail on create after considerably longer than 80 minutes. Since these are probably separate issues I opened a new bug for that one: https://bugs.launchpad.net/tripleo/+bug/1675174

Revision history for this message

Ben Nemec (bnemec) wrote on 2017-03-28:

#6

Should be fixed by https://review.openstack.org/#/c/448778/

Emilien Macchi (emilienm) on 2017-03-31

Changed in tripleo:
status:	Triaged → Fix Released
tags:	removed: alert

tripleo

Update timeout too long in CI

Bug Description

Other bug subscribers

Remote bug watches