When os-net-config configures the network configuration on the overcloud nodes
ssh connections can be dropped.
Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task
since it was failed by a ssh connection error.
However, the first task was actually still running and it eventually succeeds.
The second task that was kicked off by ansible as a retry, sees that the
deployment is already applied, but the notification file (*.notify.json) does
not yet exist since the first task is still in progress. This causes the second
task to fail with the error reported in the bug and the whole ansible-playbook
run to then fail.
Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix
the issue as ssh doesn't drop the first connection when these are configured.
Reviewed: https:/ /review. openstack. org/604171 /git.openstack. org/cgit/ openstack/ tripleo- common/ commit/ ?id=c0f41cae9f6 72c21f05fa7b0cf bfeb66d1cfe296
Committed: https:/
Submitter: Zuul
Branch: master
commit c0f41cae9f672c2 1f05fa7b0cfbfeb 66d1cfe296
Author: James Slagle <email address hidden>
Date: Thu Sep 20 13:36:03 2018 -0400
Set SSH server keep alive options
When os-net-config configures the network configuration on the overcloud nodes
ssh connections can be dropped.
Since we have ssh retries set to 8 in ansible.cfg, ansible would retry the task
since it was failed by a ssh connection error.
However, the first task was actually still running and it eventually succeeds.
The second task that was kicked off by ansible as a retry, sees that the
deployment is already applied, but the notification file (*.notify.json) does
not yet exist since the first task is still in progress. This causes the second
task to fail with the error reported in the bug and the whole ansible-playbook
run to then fail.
Setting ServerAliveInterval and ServerAliveCountMax ssh options seems to fix
the issue as ssh doesn't drop the first connection when these are configured.
Change-Id: I08781fe2aa6472 d3fae5c5f5d0bab d1f7a3b9b2d
Closes-Bug: #1792343