stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq is broken due to Controller create timeout

Bug #1712436 reported by Alex Schultz
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
John Trowbridge

Bug Description

The overcloud deployment is timing out consistently. It last passed around Aug 16th.

http://logs.openstack.org/88/484388/3/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/0cbc207/logs/undercloud/home/jenkins/overcloud_deploy.log.txt.gz#_2017-08-22_21_47_25

2017-08-22 21:47:25 | 2017-08-22 20:28:01Z [overcloud.Controller.0.Controller]: CREATE_IN_PROGRESS state changed
2017-08-22 21:47:25 | 2017-08-22 21:47:19Z [overcloud.Controller]: CREATE_FAILED CREATE aborted
2017-08-22 21:47:25 | 2017-08-22 21:47:19Z [overcloud]: CREATE_FAILED Create timed out
2017-08-22 21:47:25 | 2017-08-22 21:47:20Z [overcloud.Controller.0]: CREATE_FAILED CREATE aborted
2017-08-22 21:47:25 | 2017-08-22 21:47:20Z [overcloud.Controller]: CREATE_FAILED Resource CREATE failed: Operation cancelled
2017-08-22 21:47:25 | 2017-08-22 21:47:21Z [overcloud.Controller.0.Controller]: CREATE_FAILED CREATE aborted
2017-08-22 21:47:25 | 2017-08-22 21:47:21Z [overcloud.Controller.0]: CREATE_FAILED Resource CREATE failed: Operation cancelled
2017-08-22 21:47:25 |
2017-08-22 21:47:25 | Stack overcloud CREATE_FAILED
2017-08-22 21:47:25 |

summary: stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq is broken
+ due to Controller create timeout
Revision history for this message
Alfredo Moralejo (amoralej) wrote :

I've been hitting this issue in newton jobs to gate rdoinfo reviews in RDO. I've been digging a bit on it and these are my findings:

- The problem in newton jobs seems to be that CONTROLLER_HOSTS variable is not being passed when calling deployed-server/scripts/get-occ-config.sh from tht [1], so it's not enabling os-collect-config service.

- Last time the job passed, it passed this variable to the script [2]

- In ocata, it's not passing the variable [3] but it's catching it via Controller_hosts variable as get-occ-config.sh has some code in ocata not present in newton [4]

- In master, we have the same as in ocata [5]

I'm not sure what changed it, could someone more familiar to tripleo-ci scripts take a look?

[1] https://logs.rdoproject.org/42/8642/2/check/rdoinfo-tripleo-newton-testing-centos-7-multinode-1ctlr-featureset005-nv/Z6d568e0d9b9c4e74a43e23828760957b/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

[2] https://logs.rdoproject.org/66/8566/2/check/rdoinfo-tripleo-newton-testing-centos-7-multinode-1ctlr-featureset005-nv/Z721beaf4349c44f586ef5a263a144722/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

[3] https://logs.rdoproject.org/42/8642/3/check/rdoinfo-tripleo-ocata-testing-centos-7-multinode-1ctlr-featureset005-nv/Z3f350caf8bda4330b3683d59bd18d726/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

[4] https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/deployed-server/scripts/get-occ-config.sh#L19

[5] https://logs.rdoproject.org/42/8642/3/check/rdoinfo-tripleo-master-testing-centos-7-multinode-1ctlr-featureset005-nv/Z3f350caf8bda4330b3683d59bd18d726/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/496566

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

I'm proposing https://review.openstack.org/#/c/496566/ , however i'm not sure if it's the right fix

Revision history for this message
Ben Nemec (bnemec) wrote :

The breaking commit is almost certainly https://github.com/openstack/tripleo-quickstart-extras/commit/619bbe8bdc433a8c170ec910a61144b30a4ab3c8

I think we should fix that rather than making user interface-affecting changes to a stable branch.

tags: added: quickstart
Revision history for this message
Alfredo Moralejo (amoralej) wrote :

Yes, that commit seems to be the root cause. I'm abandoning the cherry-pick in THT.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/newton)

Change abandoned by Alfredo Moralejo (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/496566

Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/497958

Changed in tripleo:
assignee: nobody → John Trowbridge (trown)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/497958
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=cf1786d56b30a752aac7ac1ad1624db74d88305e
Submitter: Jenkins
Branch: master

commit cf1786d56b30a752aac7ac1ad1624db74d88305e
Author: John Trowbridge <email address hidden>
Date: Fri Aug 25 11:23:03 2017 -0400

    Make deployed_server_prepare.sh compatible with newton

    In 619bbe8bdc433a8c170ec910a61144b30a4ab3c8 we broke stable/newton.
    This change assumes we only have a single controller role on newton,
    because that is the only jobs we have there.

    Change-Id: I6ac99de5c2a9f3c8e258027a1edbfeffa80cd6b3
    Closes-Bug: 1712436

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart-extras 2.1.1

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.