create_admin_via_nova returns before the ssh key is installed on all nodes

Bug #1720793 reported by John Fulton
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Giulio Fidente

Bug Description

When Mistral kicks off Ceph-Ansible, I am seeing issues like :

2017-09-29 15:38:10,768 p=19459 u=mistral | TASK [ceph-defaults : is ceph running already?] ********************************
2017-09-29 15:38:10,780 p=19459 u=mistral | [DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
2017-09-29 15:38:11,180 p=19459 u=mistral | fatal: [192.168.24.56]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true}
2017-09-29 15:38:11,181 p=19459 u=mistral | fatal: [192.168.24.71]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Could not create directory '/home/mistral/.ssh'.\r\nssh_exchange_identification: Connection closed by remote host\r\n", "unreachable": true}
2017-09-29 15:38:11,188 p=19459 u=mistral | [DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..

Which causes the deployment to fail due to the host being unreachable.

However, I am able to login to the hosts that mentions unreachable=1.

This only has become a problem since growing the overcloud deployment to 3 controllers, 3 ceph nodes, and 26 compute nodes (deployed at once).

Revision history for this message
John Fulton (jfulton-org) wrote :

Workaround: in /usr/share/ceph-ansible/ansible.cfg, set retry = 5

description: updated
summary: - ceph-ansible starts before hosts are ready
+ create_admin_via_nova returns before the ssh key is installed on all
+ nodes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/509001

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: none → queens-1
tags: added: pike-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/510970

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/509001
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=47e66a81681b8327b5d1c284e54bb0495e2f4872
Submitter: Jenkins
Branch: master

commit 47e66a81681b8327b5d1c284e54bb0495e2f4872
Author: Giulio Fidente <email address hidden>
Date: Mon Oct 2 22:44:18 2017 +0200

    Ensure ssh key is active before returning from create_admin_via_nova

    We need to make sure os-collect-config has pulled in the new
    software deployment and committed the changes before returning.

    Also sets the ceph-ansible playbook retries to 3 to make sure we
    don't fail unnecessarily on unpredictable network issues.

    Change-Id: I544abf5053f18984d93cf381812372029f4ce498
    Closes-Bug: #1720793

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/pike)

Reviewed: https://review.openstack.org/510970
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=f27b723b0a3ebade2f687a6c7f81a5df7aedddcc
Submitter: Zuul
Branch: stable/pike

commit f27b723b0a3ebade2f687a6c7f81a5df7aedddcc
Author: Giulio Fidente <email address hidden>
Date: Mon Oct 2 22:44:18 2017 +0200

    Ensure ssh key is active before returning from create_admin_via_nova

    We need to make sure os-collect-config has pulled in the new
    software deployment and committed the changes before returning.

    Also sets the ceph-ansible playbook retries to 3 to make sure we
    don't fail unnecessarily on unpredictable network issues.

    Change-Id: I544abf5053f18984d93cf381812372029f4ce498
    Closes-Bug: #1720793
    (cherry picked from commit 47e66a81681b8327b5d1c284e54bb0495e2f4872)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 8.1.0

This issue was fixed in the openstack/tripleo-common 8.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 7.6.3

This issue was fixed in the openstack/tripleo-common 7.6.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.