tripleo

Bug #1792343
Comment #14

Comment 14 for bug 1792343

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-09-21: Fix merged to tripleo-common (master)

#14

Reviewed: https://review.openstack.org/604171
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c0f41cae9f672c21f05fa7b0cfbfeb66d1cfe296
Submitter: Zuul
Branch: master

commit c0f41cae9f672c21f05fa7b0cfbfeb66d1cfe296
Author: James Slagle <email address hidden>
Date: Thu Sep 20 13:36:03 2018 -0400

Set SSH server keep alive options

When os-net-config configures the network configuration on the overcloud nodes
ssh connections can be dropped.

Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task
since it was failed by a ssh connection error.

However, the first task was actually still running and it eventually succeeds.

    The second task that was kicked off by ansible as a retry, sees that the
    deployment is already applied, but the notification file (*.notify.json) does
    not yet exist since the first task is still in progress. This causes the second
    task to fail with the error reported in the bug and the whole ansible-playbook
    run to then fail.

Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix
the issue as ssh doesn't drop the first connection when these are configured.

Change-Id: I08781fe2aa6472d3fae5c5f5d0babd1f7a3b9b2d
Closes-Bug: #1792343