stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq is broken due to Controller create timeout

Bug #1712436 reported by Alex Schultz on 2017-08-22
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
John Trowbridge

Bug Description

The overcloud deployment is timing out consistently. It last passed around Aug 16th.

http://logs.openstack.org/88/484388/3/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/0cbc207/logs/undercloud/home/jenkins/overcloud_deploy.log.txt.gz#_2017-08-22_21_47_25

2017-08-22 21:47:25 | 2017-08-22 20:28:01Z [overcloud.Controller.0.Controller]: CREATE_IN_PROGRESS state changed
2017-08-22 21:47:25 | 2017-08-22 21:47:19Z [overcloud.Controller]: CREATE_FAILED CREATE aborted
2017-08-22 21:47:25 | 2017-08-22 21:47:19Z [overcloud]: CREATE_FAILED Create timed out
2017-08-22 21:47:25 | 2017-08-22 21:47:20Z [overcloud.Controller.0]: CREATE_FAILED CREATE aborted
2017-08-22 21:47:25 | 2017-08-22 21:47:20Z [overcloud.Controller]: CREATE_FAILED Resource CREATE failed: Operation cancelled
2017-08-22 21:47:25 | 2017-08-22 21:47:21Z [overcloud.Controller.0.Controller]: CREATE_FAILED CREATE aborted
2017-08-22 21:47:25 | 2017-08-22 21:47:21Z [overcloud.Controller.0]: CREATE_FAILED Resource CREATE failed: Operation cancelled
2017-08-22 21:47:25 |
2017-08-22 21:47:25 | Stack overcloud CREATE_FAILED
2017-08-22 21:47:25 |

summary: stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq is broken
+ due to Controller create timeout
Alfredo Moralejo (amoralej) wrote :

I've been hitting this issue in newton jobs to gate rdoinfo reviews in RDO. I've been digging a bit on it and these are my findings:

- The problem in newton jobs seems to be that CONTROLLER_HOSTS variable is not being passed when calling deployed-server/scripts/get-occ-config.sh from tht [1], so it's not enabling os-collect-config service.

- Last time the job passed, it passed this variable to the script [2]

- In ocata, it's not passing the variable [3] but it's catching it via Controller_hosts variable as get-occ-config.sh has some code in ocata not present in newton [4]

- In master, we have the same as in ocata [5]

I'm not sure what changed it, could someone more familiar to tripleo-ci scripts take a look?

[1] https://logs.rdoproject.org/42/8642/2/check/rdoinfo-tripleo-newton-testing-centos-7-multinode-1ctlr-featureset005-nv/Z6d568e0d9b9c4e74a43e23828760957b/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

[2] https://logs.rdoproject.org/66/8566/2/check/rdoinfo-tripleo-newton-testing-centos-7-multinode-1ctlr-featureset005-nv/Z721beaf4349c44f586ef5a263a144722/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

[3] https://logs.rdoproject.org/42/8642/3/check/rdoinfo-tripleo-ocata-testing-centos-7-multinode-1ctlr-featureset005-nv/Z3f350caf8bda4330b3683d59bd18d726/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

[4] https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/deployed-server/scripts/get-occ-config.sh#L19

[5] https://logs.rdoproject.org/42/8642/3/check/rdoinfo-tripleo-master-testing-centos-7-multinode-1ctlr-featureset005-nv/Z3f350caf8bda4330b3683d59bd18d726/undercloud/var/log/deployed-server-os-collect-config.log.txt.gz

Alfredo Moralejo (amoralej) wrote :

I'm proposing https://review.openstack.org/#/c/496566/ , however i'm not sure if it's the right fix

Ben Nemec (bnemec) wrote :

The breaking commit is almost certainly https://github.com/openstack/tripleo-quickstart-extras/commit/619bbe8bdc433a8c170ec910a61144b30a4ab3c8

I think we should fix that rather than making user interface-affecting changes to a stable branch.

tags: added: quickstart
Alfredo Moralejo (amoralej) wrote :

Yes, that commit seems to be the root cause. I'm abandoning the cherry-pick in THT.

Change abandoned by Alfredo Moralejo (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/496566

Changed in tripleo:
milestone: pike-rc1 → pike-rc2

Fix proposed to branch: master
Review: https://review.openstack.org/497958

Changed in tripleo:
assignee: nobody → John Trowbridge (trown)
status: Triaged → In Progress

Reviewed: https://review.openstack.org/497958
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=cf1786d56b30a752aac7ac1ad1624db74d88305e
Submitter: Jenkins
Branch: master

commit cf1786d56b30a752aac7ac1ad1624db74d88305e
Author: John Trowbridge <email address hidden>
Date: Fri Aug 25 11:23:03 2017 -0400

    Make deployed_server_prepare.sh compatible with newton

    In 619bbe8bdc433a8c170ec910a61144b30a4ab3c8 we broke stable/newton.
    This change assumes we only have a single controller role on newton,
    because that is the only jobs we have there.

    Change-Id: I6ac99de5c2a9f3c8e258027a1edbfeffa80cd6b3
    Closes-Bug: 1712436

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/tripleo-quickstart-extras 2.1.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers