3node jobs failing due to missing file UpgradeInitDeployment

Bug #1786520 reported by Rafael Folco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

2018-08-10 12:30:19 | fatal: [centos-7-rax-dfw-0001265364]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'file'. Error was a <class 'ansible.errors.AnsibleError'>, original message: could not locate file in lookup: ControllerApi/centos-7-rax-dfw-0001265357/UpgradeInitDeployment"}

http://logs.openstack.org/19/570719/34/check/tripleo-ci-centos-7-3nodes-multinode/ad08b2b/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-08-10_12_30_19

Looking at a successful job, it seems to be missing some tasks like
[Render deployment file for UpgradeInitDeployment]
http://logs.openstack.org/21/582521/5/check/tripleo-ci-centos-7-3nodes-multinode/a461707/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-08-10_11_35_42

Facts:
- This seems to be similar to https://bugs.launchpad.net/tripleo/+bug/1784078
- 3node jobs failed 70% of times today
- Fails for master and stable branches

Tags: ci
Revision history for this message
Marios Andreou (marios-b) wrote :

poked a little to mark it triaged (just noticed it already is)... anyway I quickly found 3 more examples of this [0][1][2] from recent runs at [3]

[0] http://logs.openstack.org/45/560445/122/check/tripleo-ci-centos-7-3nodes-multinode/ec7b45c/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-08-21_01_45_25

 2018-08-21 01:45:25 | fatal: [centos-7-inap-mtl01-0001422621]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'file'. Error was a <class 'ansible.errors.AnsibleError'>, original message: could not locate file in lookup: ControllerApi/centos-7-inap-mtl01-0001422620/UpgradeInitDeployment"}

[1] http://logs.openstack.org/19/581919/22/check/tripleo-ci-centos-7-3nodes-multinode/2638bd8/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-08-21_01_30_16

 2018-08-21 01:30:18 | or'>, original message: could not locate file in lookup: ControllerApi/centos-7-rax-iad-0001422546/UpgradeInitDeployment"}

[2] http://logs.openstack.org/03/593903/3/check/tripleo-ci-centos-7-3nodes-multinode/17d85b2/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-08-20_23_58_10

 2018-08-20 23:58:10 | fatal: [centos-7-ovh-bhs1-0001421868]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'file'. Error was a <class 'ansible.errors.AnsibleError'>, original message: could not locate file in lookup: ControllerApi/centos-7-ovh-bhs1-0001421871/UpgradeInitDeployment"}
 2018-08-20 23:58:10 |

[3] http://cistatus.tripleo.org/ "tripleo-ci-centos-7-3nodes-multinode"

Revision history for this message
Ronelle Landy (rlandy) wrote :

I looked into this failure a while back when we reparented the zuul jobs:

http://logs.openstack.org/76/581376/6/check/tripleo-ci-centos-7-3nodes-multinode/4708903/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-07-12_22_59_54

Basically ... ControllerApi/centos-7-inap-mtl01-0000693370/UpgradeInitDeployment does not exist - ControllerApi/centos-7-inap-mtl01-0000693368/UpgradeInitDeployment does

Looking at http://logs.openstack.org/76/581376/6/check/tripleo-ci-centos-7-3nodes-multinode/4708903/logs/undercloud/home/zuul/hostnamemap.yaml.txt.gz - it is confusing overcloud-controllerapi-0: centos-7-inap-mtl01-0000693368 and overcloud-controller-0: centos-7-inap-mtl01-0000693370

This is the failing line:
https://github.com/openstack/tripleo-common/blob/master/tripleo_common/templates/deployments.yaml#L3

The nodes file: https://github.com/openstack/tripleo-quickstart/blob/master/config/nodes/2ctlr.yml relies on the nodes to be assigned in a particular order. Sometimes this works out and sometimes it does not

Revision history for this message
Ronelle Landy (rlandy) wrote :

<slagle> rlandy: ultimately, the job has no maintainer. i've not maintained since someone else took it over and transitioned it to oooq
<slagle> rlandy: without a maintainer, it should just be removed

Revision history for this message
Ronelle Landy (rlandy) wrote :

shardy added review to the overcloud role to make this easier to define later - should check into this

Changed in tripleo:
milestone: rocky-rc1 → rocky-rc2
Changed in tripleo:
milestone: rocky-rc2 → stein-1
Revision history for this message
Martin Kopec (mkopec) wrote :
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.