Comment 14 for bug 1792343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/604171
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c0f41cae9f672c21f05fa7b0cfbfeb66d1cfe296
Submitter: Zuul
Branch: master

commit c0f41cae9f672c21f05fa7b0cfbfeb66d1cfe296
Author: James Slagle <email address hidden>
Date: Thu Sep 20 13:36:03 2018 -0400

    Set SSH server keep alive options

    When os-net-config configures the network configuration on the overcloud nodes
    ssh connections can be dropped.

    Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task
    since it was failed by a ssh connection error.

    However, the first task was actually still running and it eventually succeeds.

    The second task that was kicked off by ansible as a retry, sees that the
    deployment is already applied, but the notification file (*.notify.json) does
    not yet exist since the first task is still in progress. This causes the second
    task to fail with the error reported in the bug and the whole ansible-playbook
    run to then fail.

    Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix
    the issue as ssh doesn't drop the first connection when these are configured.

    Change-Id: I08781fe2aa6472d3fae5c5f5d0babd1f7a3b9b2d
    Closes-Bug: #1792343