Comment 0 for bug 1652323

Revision history for this message
Alistair Coles (alistair-coles) wrote : ssync syncs an expired object as a tombstone, probe test_object_expirer fails

With replication method = ssync:

When ssync sender on node A syncs an expired object t0.data which has an *expired* delete-at header value of t_expire it sends a DELETE subrequest which generates a tombstone on the receiver node B at t0.

So after sync we have t0.data on sender node A and t0.ts on receiver node B. That's not good.

When the expirer runs and tries to delete the expired object, the expirer's DELETE to node A succeeds and node A gets t_expire.ts. The expirer's DELETE to node B fails with 412 because the tombstone t0.ts on node B does not have an exires-at header that matches the x-if-delete-at header sent by the expirer. So the result is t_expire.ts on node A and t1.ts on node B.

The next time the replicator runs this anomaly will be corrected and both nodes will end up with t_expires.ts. However, apart from the fact that a replication process should not actually generate inconsistent state, the anomaly has undesirable side-effects:

1. The expirer DELETE to node B fails and is therefore retried (by default 3 times)

2. Because the expirer DELETE to node B fails, some container db listings are not updated with the delete, so container listing remain inconsistent after expiration.

This can all be seen to play out with test/probe/test_object_expirer.py:TestObjectExpirer.test_expirer_delete_returns_outdated_412 which fails if replication method is ssync but passes with rsync

The solution is likely to be for the ssync sender to be more discriminating when it opens a diskfile and gets a DiskFileDeleted exception, here:

https://github.com/openstack/swift/blob/3218f8b064e462d901466b04a4813e15ec96da85/swift/obj/ssync_sender.py#L349-L351

When the exception is DiskFileExpired, the sender should probably attempt send_put/post rather than send_delete.