oooq now believes that overcloud deploy succeeded even though it failed

Bug #1674955 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sagi (Sergey) Shnaidman

Bug Description

With Ib9342ea169229dc8747579b2ed96bdd396b9f314 we just rely on the status code of overcloud deploy command to understand if a deploy failed or not (which seems rather sensible). It appears though that when an overcloud deploy fails, oooq will consider it successful now. For example:
http://logs.openstack.org/79/445479/3/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/2da08a1/logs/undercloud/home/jenkins/overcloud_deploy.log.txt.gz#_2017-03-22_09_38_06

We see it failed:
2017-03-22 08:26:00Z [overcloud.AllNodesDeploySteps.CephStorageDeployment_Step3]: CREATE_COMPLETE state changed
2017-03-22 09:38:06Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED CREATE aborted
2017-03-22 09:38:06Z [overcloud]: CREATE_FAILED Create timed out
2017-03-22 09:38:06Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step3]: CREATE_FAILED CREATE aborted
2017-03-22 09:38:06Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Resource CREATE failed: Operation cancellHeat Stack create failed.

Yet, unless I am mistaken, oooq considered it successful:
http://logs.openstack.org/79/445479/3/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/2da08a1/console.html#_2017-03-22_09_39_06_534388

2017-03-22 09:39:06.534388 | TASK [did the deployment pass or fail?] ****************************************
2017-03-22 09:39:06.534458 | task path: /home/jenkins/workspace/gate-tripleo-ci-centos-7-nonha-multinode-oooq/.quickstart/playbooks/multinode.yml:93
2017-03-22 09:39:06.570905 | Wednesday 22 March 2017 09:39:06 +0000 (0:00:00.120) 2:03:23.690 *******
2017-03-22 09:39:06.618626 | ok: [localhost] => {
2017-03-22 09:39:06.618723 | "failed": false,
2017-03-22 09:39:06.619443 | "failed_when_result": false,
2017-03-22 09:39:06.620060 | "overcloud_deploy_result": "passed"
2017-03-22 09:39:06.620111 | }
2017-03-22 09:39:06.647867 |
2017-03-22 09:39:06.647959 | PLAY [validate the overcloud] **************************************************

Note I am setting this as critical as it becomes super confusing to even understand why a job failed now (the console log imply it passed)

Tags: ci
Changed in tripleo:
assignee: nobody → Sagi (Sergey) Shnaidman (sshnaidm)
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Revision history for this message
Michele Baldessari (michele) wrote :

Proper return code issue tracking here https://bugs.launchpad.net/bugs/1674982

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/448541
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=edaea82aa9349c4695acb1a8ae1a5acb963eded7
Submitter: Jenkins
Branch: master

commit edaea82aa9349c4695acb1a8ae1a5acb963eded7
Author: Sagi Shnaidman <email address hidden>
Date: Wed Mar 22 14:14:26 2017 +0200

    Exit with error code when overcloud wasn't deployed

    If overcloud stack wasn't deployed successfully, exit with code 1
    without depending on exit code of tripleoclient.

    Close-Bug: #1674955
    Change-Id: I391bec38b1f81901efce244486a67a7b8b43327f

Changed in tripleo:
status: Triaged → Fix Released
tags: removed: alert
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.