x-newest doesnt return newest metadata for containers

Bug #1384451 reported by Aditya Sawhney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Low
Unassigned

Bug Description

Doing a container HEAD with X-Newest header immediately after POST on container sometimes returns stale metadata even though all the container servers are working as normal.

The reason for this behavior is that the operation to update the put-timestamp and metadata in container-stat database isn't atomic.

swift\container\server.py Line 275
=====================================

{code}

def PUT(self, req):
 ...
 else: # put container
(1) created = self._update_or_create(req, broker, timestamp)
  metadata = {}
(2) metadata.update(
   (key, (value, timestamp))
   for key, value in req.headers.iteritems()
   if key.lower() in self.save_headers or
   is_sys_or_user_meta('container', key))
{code}

So, in above code a race condition exists where one thread is trying to update the container metadata while another thread is trying to read the latest metadata. If the read request executes after (1) but before (2) is executed then it gets back stale data with latest timestamp.

Now, the proxy looks at the put-timestamps which will be the same for all replicas and fallbacks to created-at timestamp (x-timestamp). If the replica with stale data was created later than other replicas (when we did container put) then it will have a x-timestamp greater than the others which will make proxy pick the replica with stale data.

swift\proxy\controllers\base.py Line 857
==========================================

{code}
if sources:
 sources.sort(key=lambda s: source_key(s[0]))
 source, node = sources.pop()

def source_key(resp):
    return float(resp.getheader('x-put-timestamp') or
                 resp.getheader('x-timestamp') or 0)
{code}

The code in container\server.py should either atomically update the metadata and put-timestamp or update the put-timestamp AFTER it has updated the metadata.

This is serious defect which can lead to inconsistencies in our system which isn't acceptable.

Revision history for this message
clayg (clay-gerrard) wrote :

I didn't ever really think about using X-newest with containers - that works?

Oh, how interesting, if you do your metadata updates with PUT instead of POST you can get a newer timestamp... hrmm....

Aren't container metadata updates sorta racy anyway? What with read modify write all up in there?

Also you say "[in]consistencies in our system which isn't acceptable" - can you elaborate? AP systems like Swift tend to push clients into gracefully handling stale reads. I'd be fine with minimizing that if the performance implications are reasonable - but what do you expect x-newest to do under failure?

Revision history for this message
Aditya Sawhney (aditya-sawhney) wrote :

The bottom-line is that we want X-Newest to work as expected under normal conditions (i.e. when there are no failures). The contract is that it should return the newest data across all replicas at a given point in time.

Now, in order to handle failure conditions, we have built a stronger consistency model (by enhancing swift code) which is layered on top of X-Newest which fails the request if any of the replica in unreachable/unavailable so that client can retry later.

I don't think there are performance implications. The code needs to be modified to handle the create and update case differently.

description: updated
description: updated
clayg (clay-gerrard)
summary: - x-newest doesnt return newest data under certain race conditions
+ x-newest doesnt return newest data for containers
summary: - x-newest doesnt return newest data for containers
+ x-newest doesnt return newest metadata for containers
clayg (clay-gerrard)
summary: - x-newest doesnt return newest metadata for containers
+ x-newest doesnt return newest metadata for containers updated with POST
summary: - x-newest doesnt return newest metadata for containers updated with POST
+ x-newest doesnt return newest metadata for containers
Revision history for this message
clayg (clay-gerrard) wrote :

I think this *almost* got fixed when Kota fixed POST to update PUT timestamp - https://review.openstack.org/#/c/198632/

It's always going to be a *little* racy because the "most recent update" may not be the aggregate of *all* the latest updates - but at least we have the signal.

The problem is POST now updates X-Put-Timestamp - but not X-Timestamp which remains still the time the container was created:

ubuntu@saio:~$ curl http://localhost:6041/sdb4/450/AUTH_test/test -v
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 6041 (#0)
> GET /sdb4/450/AUTH_test/test HTTP/1.1
> Host: localhost:6041
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 13
< X-Backend-Timestamp: 1501110795.41336
< X-Container-Object-Count: 1
< X-Put-Timestamp: 1501112155.68218
< X-Backend-Put-Timestamp: 1501112155.68218
< X-Container-Meta-Color: eal
< X-Backend-Delete-Timestamp: 0000000000.00000
< X-Container-Bytes-Used: 1172
< X-Timestamp: 1501110795.41336
< X-Backend-Storage-Policy-Index: 0
< Content-Type: text/plain; charset=utf-8
< X-Backend-Status-Changed-At: 1501110795.39889
< Date: Wed, 26 Jul 2017 23:38:20 GMT
<
pete_test.py
* Connection #0 to host localhost left intact

so then X-Newest (which *totally* works with containers) will make requests to all container db's and then always return whichever one was created most recently (instead of whichever one was *updated* most recently)

Changed in swift:
importance: Undecided → Low
status: New → Confirmed
tags: added: container-server
tags: added: low-hanging-fruit
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.