POST can cause subsequent EC GET to return 503

Bug #1912014 reported by Alistair Coles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Committed
Undecided
Alistair Coles

Bug Description

When backend servers have a mix of durable and newer non-durable fragments for an EC object, a GET will return 200 with the older durable object (assuming there are sufficient older durable fragments to reconstruct the object). However, if a POST is made to an object in that state, subsequent GETS may then return a 503. The result may be a 200; the bug is not deterministic.

The bug is caused by an interaction between the fragments' X-Backend-Timestamp and X-Backend-Data-Timestamp in the proxy EC response handler. As backend 200 responses are added to the proxy response buckets, the bucket timestamp, which is initially equal to the X-Backend-Data-Timestamp, is updated by the X-Backend-Timestamp. Without the newer metadata, these two timestamps are the same, but when the metadata is POSTed the X-Backend-Timestamp takes a newer value and so the proxy response bucket timestamp deviates from the X-Backend-Data-Timestamp of the fragments that it is collecting.

This deviation feeds into the X-Backend-Fragment-Preferences that are sent to backend servers as the proxy tries to hunt down the older non-durable fragments: the frag prefs *should* exclude data frags with the timestamp of the non-durable data, but instead the frag prefs exclude only the metadata timestamp. There are no data frags with the metadata timestamp, so the backend servers continue to return the newer non-durable fragments. I have observed repeated GET requests to the same object server, returning the same non-durable fragment, which eventually consume the proxy's request allowance, causing a 503 from the proxy to client.

The bug appears to have come with https://review.opendev.org/c/openstack/swift/+/711342, commit 8f60e0a2607514f05fb873e4a313ab4a93df7601, which enhances the proxy response bucket class to collect bad responses as well as good responses. In the case of bad responses, buckets collect responses of same status, and we *do* want the bucket timestamp to be updated with X-Backend-Timestamp. But for good responses the buckets collect responses with the same timestamp and this must always be the X-Backend-Data-Timestamp.

The proposed fix will include a unit test that reproduces the bug.

Changed in swift:
assignee: nobody → Alistair Coles (alistair-coles)
status: New → In Progress
Revision history for this message
Alistair Coles (alistair-coles) wrote :
Changed in swift:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.27.0

This issue was fixed in the openstack/swift 2.27.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.