2017-06-08 08:57:31 |
Alistair Coles |
description |
With replication method = ssync:
When ssync sender on node A syncs an expired object t0.data which has an *expired* delete-at header value of t_expire it sends a DELETE subrequest which generates a tombstone on the receiver node B at t0.
So after sync we have t0.data on sender node A and t0.ts on receiver node B. That's not good.
When the expirer runs and tries to delete the expired object, the expirer's DELETE to node A succeeds and node A gets t_expire.ts. The expirer's DELETE to node B fails with 412 because the tombstone t0.ts on node B does not have an exires-at header that matches the x-if-delete-at header sent by the expirer. So the result is t_expire.ts on node A and t1.ts on node B.
The next time the replicator runs this anomaly will be corrected and both nodes will end up with t_expires.ts. However, apart from the fact that a replication process should not actually generate inconsistent state, the anomaly has undesirable side-effects:
1. The expirer DELETE to node B fails and is therefore retried (by default 3 times)
2. Because the expirer DELETE to node B fails, some container db listings are not updated with the delete, so container listing remain inconsistent after expiration.
This can all be seen to play out with test/probe/test_object_expirer.py:TestObjectExpirer.test_expirer_delete_returns_outdated_412 which fails if replication method is ssync but passes with rsync
The solution is likely to be for the ssync sender to be more discriminating when it opens a diskfile and gets a DiskFileDeleted exception, here:
https://github.com/openstack/swift/blob/3218f8b064e462d901466b04a4813e15ec96da85/swift/obj/ssync_sender.py#L349-L351
When the exception is DiskFileExpired, the sender should probably attempt send_put/post rather than send_delete. |
With replication method = ssync:
When ssync sender on node A syncs an expired object t0.data which has an *expired* delete-at header value of t_expire it sends a DELETE subrequest which generates a tombstone on the receiver node B at t0.
So after sync we have t0.data on sender node A and t0.ts on receiver node B. That's not good.
When the expirer runs and tries to delete the expired object, the expirer's DELETE to node A succeeds and node A gets t_expire.ts. The expirer's DELETE to node B fails with 412 because the tombstone t0.ts on node B does not have an exires-at header that matches the x-if-delete-at header sent by the expirer. So the result is t_expire.ts on node A and t1.ts on node B.
The next time the replicator runs this anomaly will be corrected and both nodes will end up with t_expires.ts. However, apart from the fact that a replication process should not actually generate inconsistent state, the anomaly has undesirable side-effects:
1. The expirer DELETE to node B fails and is therefore retried (by default 3 times)
2. Because the expirer DELETE to node B fails, some container db listings are not updated with the delete, so container listing remain inconsistent after expiration.
(Until https://review.openstack.org/416384 was merged, this could all be seen to play out with test/probe/test_object_expirer.py:TestObjectExpirer.test_expirer_delete_returns_outdated_412 which failed if replication method is ssync but passes with rsync)
The solution is likely to be for the ssync sender to be more discriminating when it opens a diskfile and gets a DiskFileDeleted exception, here:
https://github.com/openstack/swift/blob/3218f8b064e462d901466b04a4813e15ec96da85/swift/obj/ssync_sender.py#L349-L351
When the exception is DiskFileExpired, the sender should probably attempt send_put/post rather than send_delete. |
|