GET/HEAD should stop continuing to search more nodes if a tombstone is reached

Bug #1560574 reported by Caleb Tennis
16
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Thiago da Silva

Bug Description

If we encounter a tombstone, we should just return a 404 vs. continuing to search more nodes for actual object data.

rohita joshi (rjoshi16)
Changed in swift:
assignee: nobody → rohita joshi (rjoshi16)
rohita joshi (rjoshi16)
Changed in swift:
assignee: rohita joshi (rjoshi16) → nobody
Revision history for this message
Samuel Merritt (torgomatic) wrote :

I'm not so sure this is a good idea. It may end up just pushing the problem around.

Right now, we have a case where a PUT to 3 nodes, then a DELETE to 2 nodes, then a GET might return a 200. However, we have another case where a PUT to 3 nodes, then a DELETE to 3 nodes, then a PUT to 2 nodes, then a GET will always return a 200.

I think that, if we make the requested behavior change in Swift, we'll have another bug report in six months complaining of a 404 after a PUT(3)/DELETE(3)/PUT(2)/GET sequence.

I could be wrong though.

Revision history for this message
Caleb Tennis (ctennis) wrote :

Perhaps, but at least it's deterministic. I would expect if the system was littered with various copies of both the object and tombstones, and I did a GET, that I would get undefined results. Hopefully the ticket I opened with regards to using X-Newest will sort out that particular issue.

However, why do we treat tombstones are normal 404s and then search 2x replicas? If I find 1 copy of the object, I return the results, no matter if it's the latest and greatest or not. But I can go through 5 tombstones, find a copy of the object, and still return it?

It feels like a tombstone presence should count the same as object presence, subject to normal eventual consistency, and let X-Newest sort out resolving conflicts.

Revision history for this message
Caleb Tennis (ctennis) wrote :

Perhaps what would make more sense is to keep track of the fact that we found a tombstone, but keep on the search for object data, and if we find object data in the next location, do a timestamp check? If the object data is newer, then return it, but the tombstone is newer, don't.

Revision history for this message
clayg (clay-gerrard) wrote :

> Perhaps what would make more sense is to keep track of the fact that we found
> a tombstone, but keep on the search for object data, and if we find object
> data in the next location, do a timestamp check?

^ This 100%

We should still have a GET dig around in the cluster trying to find anything that will 200 before it hands back a 404 - the question is what to do with a backend 200 response when we find it?

If it's older than the other tombstones we found - is it still useful to return it knowing that it's a stale read?

As long as 404's include the x-backend-timestamp of tombstones I think it would be straight forward to keep a list of found tombstone timestamps on the GETorHEADHandler instance and when we find a 200 we only add it to valid sources it if it's x-backend-timestamp is newer than all of the other previously discovered 404's.

... there's probably still a case where the first node to respond is 200 and we haven't collected any 404's yet - but I don't think we should ever ask a second node to "verify" the 200 - without x-newest I think the change is still "reasonable endeavor"; but I think in the write affinity case it could really help to not return stale reads from handoffs.

Changed in swift:
importance: Undecided → Medium
status: New → Confirmed
Changed in swift:
assignee: nobody → Thiago da Silva (thiagodasilva)
Changed in swift:
status: Confirmed → In Progress
Revision history for this message
clayg (clay-gerrard) wrote :

https://review.openstack.org/#/c/371150/

^ currently in conflict; but I bet it'll be great when it's rebased!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/371150
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8d882095375dc53acdafafac40aa2efc3bafbcd4
Submitter: Zuul
Branch: master

commit 8d882095375dc53acdafafac40aa2efc3bafbcd4
Author: Thiago da Silva <email address hidden>
Date: Thu Sep 15 16:45:06 2016 -0400

    Return 404 on a GET if tombstone is newer

    Currently the proxy keeps iterating through
    the connections in hope of finding a success even
    if it already has found a tombstone (404).

    This change changes the code a little bit to compare
    the timestamp of a 200 and a 404, if the tombstone is
    newer, then it should be returned, instead of returning
    a stale 200.

    Closes-Bug: #1560574

    Co-Authored-By: Tim Burke <email address hidden>
    Change-Id: Ia81d6832709d18fe9a01ad247d75bf765e8a89f4
    Signed-off-by: Thiago da Silva <email address hidden>

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/deep)

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/517718

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/deep)
Download full text (17.0 KiB)

Reviewed: https://review.openstack.org/517718
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=5c564e53966b74d6d7589c3505dd33816b3c19f1
Submitter: Zuul
Branch: feature/deep

commit 77a8a4455d4a2bd0a9b9146a01acfe0d93fe7550
Author: Tim Burke <email address hidden>
Date: Tue Oct 3 00:11:07 2017 +0000

    Let clients request heartbeats during SLO PUTs

    An SLO PUT requires that we HEAD every referenced object; as a result, it
    can be a very time-intensive operation. This makes it difficult as a
    client to differentiate between a proxy-server that's still doing work and
    one that's crashed but left the socket open.

    Now, clients can opt-in to receiving heartbeats during long-running PUTs
    by including the query parameter

        heartbeat=on

    With heartbeating turned on, the proxy will start its response immediately
    with 202 Accepted then send a single whitespace character periodically
    until the request completes. At that point, a final summary chunk will be
    sent which includes a "Response Status" key indicating success or failure
    and (if successful) an "Etag" key indicating the Etag of the resulting SLO.

    This mechanism is very similar to the way bulk extractions and deletions
    work, and even the way SLO behaves for ?multipart-manifest=delete requests.

    Note that this is opt-in: this prevents us from sending the 202 response
    to existing clients that may mis-interpret it as an immediate indication
    of success.

    Co-Authored-By: Alistair Coles <email address hidden>
    Related-Bug: 1718811
    Change-Id: I65cee5f629c87364e188aa05a06d563c3849c8f3

commit 92705bb36b4a771a757977d11875e7b929c41741
Author: David Rabel <email address hidden>
Date: Thu Nov 2 12:38:41 2017 +0100

    Fix indent in overview_policies.rst

    Change-Id: I7f070956d8b996db798837392adfca4483067aea

commit feee3998408e5ed03563c317ad9506ead92083a6
Author: Clay Gerrard <email address hidden>
Date: Fri Sep 1 14:15:45 2017 -0700

    Use check_drive consistently

    We added check_drive to the account/container servers to unify how all
    the storage wsgi servers treat device dirs/mounts. Thus pushes that
    unification down into the consistency engine.

    Drive-by:
     * use FakeLogger less
     * clean up some repeititon in probe utility for device re-"mounting"

    Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
    Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a

commit 29e9ae1cc5b74dce205643c101601555c06b67dc
Author: Tim Burke <email address hidden>
Date: Thu Oct 19 18:53:04 2017 +0000

    Make xml responses less insane

    Looking at bulk extractions:

       $ tar c *.py | curl http://saio:8090/v1/AUTH_test/c?extract-archive=tar \
          -H accept:application/xml -T -
       <?xml version="1.0" encoding="UTF-8"?>
       <delete>
       <number_files_created>2</number_files_created>
       <response_body></response_body>
       <response_status>201 Created</response_status>
       <errors>
       </errors>
       </delete>

    Or SLO upload failures:

       $ curl http://saio:8090/v1/AUTH_...

tags: added: in-feature-deep
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/s3api)

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/517838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/s3api)
Download full text (8.2 KiB)

Reviewed: https://review.openstack.org/517838
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=76917dfb1dcc509ca9fa181639f37c50ade0434a
Submitter: Zuul
Branch: feature/s3api

commit 77a8a4455d4a2bd0a9b9146a01acfe0d93fe7550
Author: Tim Burke <email address hidden>
Date: Tue Oct 3 00:11:07 2017 +0000

    Let clients request heartbeats during SLO PUTs

    An SLO PUT requires that we HEAD every referenced object; as a result, it
    can be a very time-intensive operation. This makes it difficult as a
    client to differentiate between a proxy-server that's still doing work and
    one that's crashed but left the socket open.

    Now, clients can opt-in to receiving heartbeats during long-running PUTs
    by including the query parameter

        heartbeat=on

    With heartbeating turned on, the proxy will start its response immediately
    with 202 Accepted then send a single whitespace character periodically
    until the request completes. At that point, a final summary chunk will be
    sent which includes a "Response Status" key indicating success or failure
    and (if successful) an "Etag" key indicating the Etag of the resulting SLO.

    This mechanism is very similar to the way bulk extractions and deletions
    work, and even the way SLO behaves for ?multipart-manifest=delete requests.

    Note that this is opt-in: this prevents us from sending the 202 response
    to existing clients that may mis-interpret it as an immediate indication
    of success.

    Co-Authored-By: Alistair Coles <email address hidden>
    Related-Bug: 1718811
    Change-Id: I65cee5f629c87364e188aa05a06d563c3849c8f3

commit 92705bb36b4a771a757977d11875e7b929c41741
Author: David Rabel <email address hidden>
Date: Thu Nov 2 12:38:41 2017 +0100

    Fix indent in overview_policies.rst

    Change-Id: I7f070956d8b996db798837392adfca4483067aea

commit feee3998408e5ed03563c317ad9506ead92083a6
Author: Clay Gerrard <email address hidden>
Date: Fri Sep 1 14:15:45 2017 -0700

    Use check_drive consistently

    We added check_drive to the account/container servers to unify how all
    the storage wsgi servers treat device dirs/mounts. Thus pushes that
    unification down into the consistency engine.

    Drive-by:
     * use FakeLogger less
     * clean up some repeititon in probe utility for device re-"mounting"

    Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
    Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a

commit 29e9ae1cc5b74dce205643c101601555c06b67dc
Author: Tim Burke <email address hidden>
Date: Thu Oct 19 18:53:04 2017 +0000

    Make xml responses less insane

    Looking at bulk extractions:

       $ tar c *.py | curl http://saio:8090/v1/AUTH_test/c?extract-archive=tar \
          -H accept:application/xml -T -
       <?xml version="1.0" encoding="UTF-8"?>
       <delete>
       <number_files_created>2</number_files_created>
       <response_body></response_body>
       <response_status>201 Created</response_status>
       <errors>
       </errors>
       </delete>

    Or SLO upload failures:

       $ curl http://saio:8090/v1/AUTH...

Read more...

tags: added: in-feature-s3api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.16.0

This issue was fixed in the openstack/swift 2.16.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.