enable-ssh-admin.sh may fail on subsequent deployments
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Ben Nemec |
Bug Description
I've been seeing a number of failures to run enable-ssh-admin.sh since config-download was turned on by default. For example:
(undercloud) [centos@
Waiting for messages on queue 'tripleo' with no timeout.
Removing the current plan files
Uploading new plan files
Plan updated.
Processing templates in the directory /tmp/tripleocli
Deploying templates in the directory /tmp/tripleocli
Stack overcloud/
Deploying overcloud configuration
/usr/share/
There are two problems with this: 1) The script failed and 2) there's no indication of why.
It turns out that 1) is explained by ssh host keys. On subsequent deploys, if the same IP address is selected for an overcloud there may already be a host key in known_hosts, and if it isn't removed first then enable-ssh-admin.sh will fail.
2) is because the info log level is off by default in tripleoclient, so the logging of the output of the script is swallowed. Due to the intermittent nature of 1) this makes debugging very difficult as a subsequent run with --debug may not fail.
Fix proposed to branch: master /review. openstack. org/564273
Review: https:/