overcloud stack update attempts to redeploy servers which have already been deployed

Bug #1789462 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

While using rocky rc1 deployed 3controller, 1 compute and 3 ceph without errors. I then simply re-ran the same 'openstack overcloud deploy ...' command to run a stack update like you would to reassert a configuration change and the stack update failed.

Though the failure resulted in placement errors like "No valid host was found" this seems only to be a side effect of a larger problem outside of Nova because Nova shouldn't have been asked to create a new resources. I.e. a `openstack server list` after the stack update showed 6controller, 2 compute and 6 ceph nodes where the new ones all were in status ERROR while the existing ones were in status ACTIVE. The deployment tries to create new nodes rather than reuse them.

An idempotence test where you just just reassert the configurations by re-running 'openstack overcloud deploy' would pick this up but for that test it cannot be reproduced using already-deployed servers [1] which might explain why upstream CI didn't reproduce this bug.

[1] https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/deployed_server.html

Changed in tripleo:
status: New → Triaged
Revision history for this message
Michele Baldessari (michele) wrote :

So I noticed that we do not install openstack-tripleo-common in the heat containers and so we miss the undercloud heat plugins:
(undercloud) [root@undercloud-0 nova]# for i in $(docker ps |grep heat|awk '{print $1}'); do docker exec -it $i sh -c 'ls -l /usr/lib/heat/'; done
ls: cannot access /usr/lib/heat/: No such file or directory
ls: cannot access /usr/lib/heat/: No such file or directory
ls: cannot access /usr/lib/heat/: No such file or directory
ls: cannot access /usr/lib/heat/: No such file or directory

I checked on queens on a non-containerized undercloud and we still ship them:
[root@undercloud-0 undercloud_heat_plugins]# ls -l /usr/lib/heat/undercloud_heat_plugins/
total 44
-rw-r--r--. 1 root root 1155 Jul 5 13:29 config.py
-rw-r--r--. 2 root root 1079 Aug 13 21:25 config.pyc
-rw-r--r--. 2 root root 1079 Aug 13 21:25 config.pyo
-rw-r--r--. 1 root root 1745 Jul 5 13:29 immutable_resources.py
-rw-r--r--. 2 root root 2477 Aug 13 21:25 immutable_resources.pyc
-rw-r--r--. 2 root root 2477 Aug 13 21:25 immutable_resources.pyo
-rw-r--r--. 1 root root 0 Jul 5 13:29 __init__.py
-rw-r--r--. 2 root root 136 Aug 13 21:25 __init__.pyc
-rw-r--r--. 2 root root 136 Aug 13 21:25 __init__.pyo
-rw-r--r--. 1 root root 1390 Jul 5 13:29 server_update_allowed.py
-rw-r--r--. 2 root root 1418 Aug 13 21:25 server_update_allowed.pyc
-rw-r--r--. 2 root root 1418 Aug 13 21:25 server_update_allowed.pyo

Not 100% sure this is the culprit though

Revision history for this message
Rabi Mishra (rabi) wrote :

First glance looks like the plugins issue as it seems like the servers are being replaced.

This was supposed to fixed in rc1 with https://review.openstack.org/#/c/588529/.

From comment #1 it does not look like the volume is mounted. Can we make recheck if it's made it to rc1 images?

Revision history for this message
Michele Baldessari (michele) wrote :

Current theory is that https://review.openstack.org/#/c/588529 was missing on the undercloud when redeploying.

Revision history for this message
Michele Baldessari (michele) wrote :

Theory at comment #3 was confirmed. Closing this one. John, scream if you disagree.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
John Fulton (jfulton-org) wrote :

+1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.