ssync fails to replicate an object that had x-delete-at removed

Bug #1683689 reported by Romain LE DISEZ
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
High
Unassigned

Bug Description

How to reproduce on a SAIO (example for replica, it also applies to EC):
* Upload an object with X-Delete-After: 60
swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing post default
swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing upload default test -H "X-Delete-After: 60"

* Before the object expires, update metadata without X-Delete-(At|After)
swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing post default test -H "X-Object-Meta-Test: test"

* Wait for the original expiration date to pass (60 seconds in this example)

* Remove a replica of the partition holding the object
$ $ ls -l /srv/*/node/*/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/
/srv/2/node/sdb2/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/:
1492504501.71970.data
1492504506.09752.meta

/srv/3/node/sdb3/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/:
1492504501.71970.data
1492504506.09752.meta

/srv/4/node/sdb4/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/:
1492504501.71970.data
1492504506.09752.meta

$ rm -rf /srv/3/node/sdb3/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/
$ rm -rf /srv/3/node/sdb3/objects/80/hashes.*

* Run the object-replicator on that partition with SSYNC (object-replicator/sync_method = ssync)
$ swift-object-replicator /etc/swift/object-server/2.conf -p 80 -v -o
object-replicator: Running object replicator in script mode.
object-replicator: 127.0.0.1:6030/sdb3/80 Unexpected response: ":ERROR: 500 'ERROR: With :UPDATES: 1 failures to 1 successes'"
object-replicator: 1/1 (100.00%) partitions replicated in 0.05s (19.73/sec, 0s remaining)
object-replicator: 1 successes, 0 failures
object-replicator: 1 suffixes checked - 0.00% hashed, 100.00% synced
object-replicator: Partition times: max 0.0467s, min 0.0467s, med 0.0467s
object-replicator: Object replication complete (once). (0.00 minutes)
object-replicator: Exited

* See the error, confirm that the object has not been replicated
$ ls -1 /srv/*/node/*/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/
/srv/2/node/sdb2/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/:
1492504501.71970.data
1492504506.09752.meta

/srv/4/node/sdb4/objects/80/d68/1419030ea4ea80b73e0719df514d3d68/:
1492504501.71970.data
1492504506.09752.meta

Consequences:
* the replica/fragment for this object will never be replicated/reconstructed
* at each rebalance, the data will not be moved, meaning some objects end up being not found even if they are in the cluster

Proposed fix: https://review.openstack.org/#/c/456921/

clayg (clay-gerrard)
Changed in swift:
importance: Undecided → High
Revision history for this message
Alistair Coles (alistair-coles) wrote :

Similar scenario is if one replica/fragment has failed to get the POST update and still has just a .data file with expired x-delete-at time. In that case the ssync receiver gets a DiskFileDeleted when it check's it's local .data file and does not request any metadata update from the ssync sender.

Changed in swift:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/460073

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/456921
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=38d35797df1d18d58eed5b537faa3696762c2e2a
Submitter: Jenkins
Branch: master

commit 38d35797df1d18d58eed5b537faa3696762c2e2a
Author: Romain LE DISEZ <email address hidden>
Date: Fri Apr 14 17:21:22 2017 +0200

    Fix SSYNC failing to replicate unexpired object

    Fix a situation where SSYNC would fail to replicate a valid object because
    the datafile contains an expired X-Delete-At information while a metafile
    contains no X-Delete-At information. Example:
     - 1454619054.02968.data => contains X-Delete-At: 1454619654
     - 1454619056.04876.meta => does not contain X-Delete-At info

    In this situation, the replicator tries to PUT the datafile, and then
    to POST the metadata. Previously, if the receiver has the datafile but
    current time is greater than the X-Delete-At, then it considers it to
    be expired and requests no updates from the sender, so the metafile is
    never synced. If the receiver does not have the datafile then it does
    request updates from the sender, but the ssync PUT subrequest is
    refused if the current time is greater than the X-Delete-At (the
    object is expired). If the datafile is transfered, the ssync POST
    subrequest fails because the object does not exist (expired).

    This commit allows PUT and POST to works so that the object can be
    replicated, by enabling the receiver object server to open expired
    diskfiles when handling replication requests.

    Closes-Bug: #1683689
    Co-Authored-By: Alistair Coles <email address hidden>
    Change-Id: I919994ead2b20dbb6c5671c208823e8b7f513715

Changed in swift:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.15.0

This issue was fixed in the openstack/swift 2.15.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.openstack.org/460073
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e109c7800fcd22c48800cd6e18943b32b49d5e0b
Submitter: Jenkins
Branch: master

commit e109c7800fcd22c48800cd6e18943b32b49d5e0b
Author: Alistair Coles <email address hidden>
Date: Wed Apr 26 12:25:55 2017 +0100

    Add probe test for ssync of unexpired metadata to an expired object

    Verify that metadata can be sync'd to a frag that has missed a POST
    and consequently that frag appears to be expired, when in fact the
    POST removed the X-Delete-At header.

    Tests the fix added by the Related-Change.

    Related-Bug: #1683689
    Related-Change: I919994ead2b20dbb6c5671c208823e8b7f513715
    Co-Authored-By: Clay Gerrard <email address hidden>

    Change-Id: I9af9fc26098893db4043cc9a8d05d772772d4259

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers