ssync syncs an expired object as a tombstone

Bug #1652323 reported by Alistair Coles on 2016-12-23
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)

Bug Description

With replication method = ssync:

When ssync sender on node A syncs an expired object which has an *expired* delete-at header value of t_expire it sends a DELETE subrequest which generates a tombstone on the receiver node B at t0.

So after sync we have on sender node A and t0.ts on receiver node B. That's not good.

When the expirer runs and tries to delete the expired object, the expirer's DELETE to node A succeeds and node A gets t_expire.ts. The expirer's DELETE to node B fails with 412 because the tombstone t0.ts on node B does not have an exires-at header that matches the x-if-delete-at header sent by the expirer. So the result is t_expire.ts on node A and t1.ts on node B.

The next time the replicator runs this anomaly will be corrected and both nodes will end up with t_expires.ts. However, apart from the fact that a replication process should not actually generate inconsistent state, the anomaly has undesirable side-effects:

1. The expirer DELETE to node B fails and is therefore retried (by default 3 times)

2. Because the expirer DELETE to node B fails, some container db listings are not updated with the delete, so container listing remain inconsistent after expiration.

(Until was merged, this could all be seen to play out with test/probe/ which failed if replication method is ssync but passes with rsync)

The solution is likely to be for the ssync sender to be more discriminating when it opens a diskfile and gets a DiskFileDeleted exception, here:

When the exception is DiskFileExpired, the sender should probably attempt send_put/post rather than send_delete.

Changed in swift:
importance: Undecided → Medium
clayg (clay-gerrard) wrote :
Changed in swift:
status: New → Confirmed
Alistair Coles (alistair-coles) wrote :

related fix/workaround for the failing test

Submitter: Jenkins
Branch: master

commit 0fd7257fc8f933b288f352bb1b843c76d2f53674
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 3 16:10:42 2017 -0800

    Fix flaky expirer probe test

    This does not address the underlying lp bug #1652323 with ssync that
    makes ssync inconsistent with expired objects.

    Change-Id: Ia12dc27a12418637587c57ff4f5744d00c661691
    Related-Bug: #1652323
    Closes-Bug: #1653836

Download full text (3.6 KiB)

Some thoughts as to how we this bug might be addressed:

(I don't like all, if any, of these options, I am just writing them down for the record)

1. sync what we have on disk regardless of it's "application level" state. This means we need ssync to be able to open a disk file that has expired, so adding that capability to diskfile. Then the ssync sender can replicate the expired diskfile to the receiver. There's at least two problems with this: (a) the receiving object server will reject an attempt to PUT an object with delete-at older than present time and (b) in case of EC sync jobs, the ssync sender will be unable to reconstruct a frag for an expired object because other nodes will not serve up their expired fragments to the reconstructor. We would need object server support to selectively GET and PUT expired objects. Plus we would sync potentially large data content that will never be served to client which seems wasteful.

(Might this get more feasible if the only thing needing to be sync'd is metadata, so in case of EC we shouldn't need to GET other frags, and we'd just need the receiving POST to be forced to accept the older x-delete-at)

(1a. to avoid moving large amounts of data unnecessarily, PUT an empty object to the receiver in order to achieve hash consistency. This has same blockers as 2 i.e. can't GET or PUT expired objects)

(1) seems like a lot of work/change to replicate data that clients will never read.

2. ssync performs expiry deletion:

ssync_sender, on getting a DiskFileExpired during send_put, gets the delete-at time from the object metadata and sends a delete to the receiver with the delete-at time and X-If-Delete-At header, effectively doing the job of the expirer (which is my dislike of this option, we'll end up duplicating expirer logic in ssync).

This attempt to expire the remote object will cause a 412 for the DELETE subrequest with X-If-Delete-At if the receiving node has a diskfile with a different x-delete-at (including an older x-delete-at!), which is quite possible given that its out of sync but not necessarily missing altogether. That's bad because it will contribute to the ssync receiver reporting failure for the entire ssync job, and possibly terminating early. Then ssync would keep retrying the same thing, until...IDK, forever?? :/

The ssync sender could also delete the local object file for good measure, but that whilst that makes the sender and receiver consistent, it creates inconsistency with other nodes. That inconsistency would be fixed by subsequent expirer or replication activity, so maybe that's ok. Note that the object server does NOT delete an object that has expired during GET handling, so there is no precedence for a process other than the expirer performing deletion of expired objects. Maybe there is a good reason to not do that??

Barring any gotchas with ssync doing the deletion, (2) might be the safest options in terms of always progressing towards the ultimately correct consistent state.

3. Ignore the expired object for sync - i.e. do not sync anything for the expired object and assume that the expirer will eventually expire all replicas/fragments, after which ssync will be able ...


Romain LE DISEZ (rledisez) wrote :

Another similar, but a bit different, case I just hit.

# ls -1

.data file contains X-Delete-At: 1454619654
.meta does not contain X-Delete-At info.

So, object is valid because the last timestamp meta does not contains expiration.

But SSYNC fails to replicate it, because the ssync_receiver return a 400 Bad Request, body="X-Delete-At in past"

It turns out SSYNC sender try to PUT the datafile with its meta, then POST the meta file with its meta. So, the datafile trigger the 400 error.

Even if the object server were accepting the datafile, it would probably refuse the following POST because the object would be expired.

summary: - ssync syncs an expired object as a tombstone, probe test_object_expirer
- fails
+ ssync syncs an expired object as a tombstone
description: updated
Alistair Coles (alistair-coles) wrote :

I think the bug described in Romain's last comment #6 is

Alistair Coles (alistair-coles) wrote :

The fix [1] for this bug introduced object server and diskfile changes that helps us work toward solution 1 enumerated in comment #5:

- we can now open expired diskfiles
- the x-backend-replication header causes x-delete-at older than 'now' to be allowed on a PUT

so we should be able to ssync an expired object (ssync sender can open it, receiver will accept it)

For EC case we need the additional step of being able to GET expired fragments. That could be achieved by including an x-backend-replication header with recosntructor fragment GET requests and modifying the object server to allow GET of expired object when the header is present (much the same as the change [1] that was made to the object server POST method)


Alistair Coles (alistair-coles) wrote :
Changed in swift:
importance: Medium → High

Submitter: Jenkins
Branch: master

commit 69df458254413d2045d7cdb00dd355d8b8783952
Author: Romain LE DISEZ <email address hidden>
Date: Fri Jun 9 14:23:05 2017 +0200

    Allow to rebuild a fragment of an expired object

    When a fragment of an expired object was missing, the reconstructor
    ssync job would send a DELETE sub-request. This leads to situation
    where, for the same object and timestamp, some nodes have a data file,
    while others can have a tombstone file.

    This patch forces the reconstructor to reconstruct a data file, even
    for expired objects. DELETE requests are only sent for tombstoned

    Co-Authored-By: Alistair Coles <email address hidden>
    Closes-Bug: #1652323
    Change-Id: I7f90b732c3268cb852b64f17555c631d668044a8

Changed in swift:
status: Confirmed → Fix Released

This issue was fixed in the openstack/swift 2.15.1 release.

Submitter: Zuul
Branch: master

commit 7afc6a06eed1e9e3fdbea756074111b8a209d266
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 14:21:39 2018 -0800

    Remove un-needed hack in probetest

    If you ran this probe test with ssync before the related change it would
    demonstrate the related bug. The hack isn't harmful, but it isn't
    needed anymore.

    Related-Change-Id: I7f90b732c3268cb852b64f17555c631d668044a8
    Related-Bug: 1652323

    Change-Id: I09e3984a0500a0f4eceec392e7970b84070a5b39

Download full text (33.3 KiB)

Submitter: Zuul
Branch: feature/s3api

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit d924fa759967b7cdca0d91f21112725f6099a254
Author: Samuel Merritt <email address hidden>
Date: Tue Jan 16 22:19:09 2018 -0800

    Remove old post-as-copy leftovers from tests.

    Since commit 1e79f828, we don't need to test with post_as_copy=True
    any more since we haven't got post_as_copy at all.

    Change-Id: I9c96ce0b812d877bbe11bdb50eb160d6ffa5933d

commit dfa0c4e604fb931d232395599bd0e7b0f11441ee
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 12:04:45 2018 +0000

    Preserve expiring object behaviour with old proxy-server

    The related change [1] causes expiring object records to no longer be
    created if the X-Delete-At-Container header is not sent to the object
    server, but old proxies prior to [2] (i.e. releases prior to 1.9.0)
    did not send this header.

    The goal of [1] can be alternatively achieved by making expiring
    object record creation be conditional on the X-Delete-At-Host header.

    [1] Related-Change: I20fc2f42f590fda995814a2fa7ba86019f9fddc1
    [2] Related-Change: Id0873a3f2198ce285fe0b0c777738eff38bc2438

    Change-Id: Ia0081693f01631d3f2a59612308683e939ced76a

commit d707fc7b6d0ceb4556dddfc258c5de8c4baff05c
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 16:30:13 2018 -0800

    DRY out tests until the stone bleeds

    Can we go deeper!?

    Change-Id: Ibd3b06542aa1bfcbcb71cc98e6bb21a6a67c12f4

commit ba8f1b1c3786df4e79fc3f9e4747d7cfb9072b6f
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 15:25:33 2018 +0000

    Fix intermittent unit test failure

    test_check_delete_headers_removes_delete_after was
    failing intermittently due to rounding of float time

    Change-Id: Ia126ad6988f387bbd2d1f5ddff0a56d457a1fc9b
    Closes-Bug: #1743804

commit e747f94313f315fdf8d8fc01fb0c5aac60c33897
Author: Kota Tsuyuzaki <email address hidden>
Date: Wed Dec 27 14:37:29 2017 +0900

    Fix InternalClient to drain response body if the request fails

    If we don't drain the body, the proxy logging in the internal client
    pipeline will log 499 client disconnect instead of actual error response

    For error responses, we try to do the most helpful thing using swob's
    closing and caching response body attribute. For non-error responses
    which are returned to the client, we endeavour to keep the app_iter
    intact and unconsumed, trusting expecting the caller to do the right
    thing is the only reasonable interface. We must cleanly close any WSGI
    app_iter which we do not return to the client rega...

tags: added: in-feature-s3api
Download full text (32.4 KiB)

Submitter: Zuul
Branch: feature/deep

commit ddb13aa5eab03b6993887eb02260b4bc0b256922
Author: vxlinux <email address hidden>
Date: Sat Jan 20 17:23:35 2018 +0800

    Remove redundant blank space in README.rst

    Change-Id: If347476e3b9185921ff174d3f8170a1c4d0622e8

commit 12f874534925b52f9d1c91580794eb9e5e9a4589
Author: vxlinux <email address hidden>
Date: Fri Jan 19 16:54:26 2018 +0800

    Add Docstrings to validate_replicas_by_tier

    New common functions should have Docstrings

    Change-Id: Icbb3cdf38509fd6d034cbb2271786559780a7b68

commit d2034cd7b6946829a7d95c4d2c71d4322f80e855
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 17:03:38 2018 -0800

    Keep object-updater stats logging consistent

    If we're going to encapsulate the stats tracking it seems reasonable if
    we ever add any more metrics we can reduce the number of places we need
    to update log messages.

    Change-Id: I187cf6cfec1e0a9138b709fa298e1991aa809ec4

commit cd2c73fd955317a3f40758cef45ee48bef8fbc79
Author: Tim Burke <email address hidden>
Date: Tue Jan 16 01:07:35 2018 +0000

    internal_client: Don't retry when we expect the same reponse

    This boils down to 404, 412, or 416; or 409 when we provided an

    This means, among other things, that the expirer won't issue 3 DELETEs
    every cycle for every stale work item.

    Related-Change: Icd63c80c73f864d2561e745c3154fbfda02bd0cc
    Change-Id: Ie5f2d3824e040bbc76d511a54d1316c4c2503732

commit 222df9185782f59ffdc96c3534afaa2fb1361235
Author: chengebj5238 <email address hidden>
Date: Thu Jan 18 17:03:11 2018 +0800

    Modify redirection URL and broken URL

    Change-Id: I9a04cb2fbe61e1fbd8185ab2fac9abbcea4d55cc

commit d1656e334959e09d13eea98c2696e58c77e4ab91
Author: Tim Burke <email address hidden>
Date: Fri Jan 12 13:17:45 2018 -0800

    slo: Send ETag header in 206 responses

    Why weren't we doing that before?? The etag should be the same as for
    GET/HEAD, and by sending it, we can assure resuming clients that they're
    downlading the same object even if they didn't include an If-Match

    Change-Id: I4ccbd1ae3a909ecb4606ef18211d1b868f5cad86
    Related-Change: Ic11662eb5c7176fbf422a6fc87a569928d6f85a1

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit f64c00b00aa8df31a937448917421891904abdc8
Author: Samuel Merritt <email address hidden>
Date: Fri Jan 12 07:17:18 2018 -0800

    Improve object-updater's stats logging

    The object updater has five different stats, but its logging only told
    you two of them (successes and failures), and it only told you after
    finishing all the async_pendings for a device. If y...

tags: added: in-feature-deep
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers