Comment 0 for bug 1933219

Revision history for this message
Archita Gshosh (architagh) wrote :

In this post, I want to bring your attention to the lack of time synchronization between two or more proxy servers. In swift, all update requests are ordered based on the clock timestamp of the proxy node. In a deployment, where there is only one proxy server, every request gets the timestamp of this server, and hence, all requests are perfectly ordered. However, if the deployment has more than one proxy server behind a load balancer, or if a proxy server is being replaced by another one, their clocks have to be synchronized. The Swift deployment requires a time synchronization method to be up and running as a prerequisite. But if that fails and not handled efficiently, the consequences can be catastrophic, as demonstrated below
a) Suppose there are two proxy servers, and proxy1’s clock is lagging behind proxy2’s clock.
b) the put request for ‘obj1’ comes to proxy2 and gets the timestamp t1_proxy2
c) once the request is served, a new put request comes for the same object ‘obj1’ at proxy1 and gets the timestamp t2_proxy1.
d) As proxy 1’s clock is lagging, t2_proxy1 < t1_proxy2. The object servers check this and reject the request. proxy1 gets a ‘409 Conflict’ response from them.
e) The proxy then sends a ‘204 Accepted’ response to the client, assuming that this request is old.
f) In this way, all the requests coming through proxy1 will be rejected for those objects that have been updated by proxy2 before.
The worst part is these updates are permanently lost. The issue can only be identified by monitoring and analyzing the ‘conflict’ responses from the object servers, which may be time-consuming. There are numerous methods at the storage layer like data replication, handoff nodes, replicators, reconstructors, auditors to ensure the data is not lost. But at the proxy layer, lack of synchronization may also result in data loss even before the data reaches the backend servers. Therefore, I suggest incorporating some method to ensure that the proxy clocks are always synchronized among themselves (even if not with the universal clock) so that the loss never happens.