Comment 8 for bug 1606887

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

We have investigated the problem with Vladimir Khlyunev and got the real reason for it.
The main problem is in TCP session and it's KeepAlive parameter.
When we revert some snapshot with skip_timesync=True the time on admin node is incorrect (in meaning it is different from real time).
When a client initiates a session to the master node to the nailgun all works fine until the ntpd daemon wants to set correct time. It does it.
If the delta of old time and new time more than 1 minute the session becomes broken.
This is a reason of the `ConnectFailure: Unable to establish connection` exception. If we try to send another request the new tcp session opens and all works fine.

So, the current decision always sync time on the master node even if the skip_timesync arg is True, is right and more preferable than others.

Moved to Fix Commited State.
If this bug will appear again, please, feel free to reopen it with the appropriate information for debug