Time Synchronization Between Proxy Servers

Bug #1933219 reported by Archita Gshosh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

In this post, I want to bring your attention to the lack of time synchronization between two or more proxy servers. In swift, all update requests are ordered based on the clock timestamp of the proxy node. In a deployment, where there is only one proxy server, every request gets the timestamp of this server, and hence, all requests are perfectly ordered. However, if the deployment has more than one proxy server behind a load balancer, or if a proxy server is being replaced by another one, their clocks have to be synchronized. The Swift deployment requires a time synchronization method to be up and running as a prerequisite. But if that fails and not handled efficiently, the consequences can be catastrophic, as demonstrated below
a) Suppose there are two proxy servers, and proxy1’s clock is lagging behind proxy2’s clock.
b) the put request for ‘obj1’ comes to proxy2 and gets the timestamp t1_proxy2
c) once the request is served, a new put request comes for the same object ‘obj1’ at proxy1 and gets the timestamp t2_proxy1.
d) As proxy 1’s clock is lagging, t2_proxy1 < t1_proxy2. The object servers check this and reject the request. proxy1 gets a ‘409 Conflict’ response from them.
e) The proxy then sends a ‘202 Accepted’ response to the client, assuming that this request is old.
f) In this way, all the requests coming through proxy1 will be rejected for those objects that have been updated by proxy2 before.
The worst part is these updates are permanently lost. The issue can only be identified by monitoring and analyzing the ‘conflict’ responses from the object servers, which may be time-consuming. There are numerous methods at the storage layer like data replication, handoff nodes, replicators, reconstructors, auditors to ensure the data is not lost. But at the proxy layer, lack of synchronization may also result in data loss even before the data reaches the backend servers. Therefore, I suggest incorporating some method to ensure that the proxy clocks are always synchronized among themselves (even if not with the universal clock) so that the loss never happens.

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.