Activity log for bug #1652323

Date Who What changed Old value New value Message
2016-12-23 14:25:53 Alistair Coles bug added bug
2016-12-23 14:26:15 Alistair Coles swift: importance Undecided Medium
2017-01-03 23:59:02 clayg swift: status New Confirmed
2017-06-08 08:56:03 Alistair Coles summary ssync syncs an expired object as a tombstone, probe test_object_expirer fails ssync syncs an expired object as a tombstone
2017-06-08 08:57:31 Alistair Coles description With replication method = ssync: When ssync sender on node A syncs an expired object t0.data which has an *expired* delete-at header value of t_expire it sends a DELETE subrequest which generates a tombstone on the receiver node B at t0. So after sync we have t0.data on sender node A and t0.ts on receiver node B. That's not good. When the expirer runs and tries to delete the expired object, the expirer's DELETE to node A succeeds and node A gets t_expire.ts. The expirer's DELETE to node B fails with 412 because the tombstone t0.ts on node B does not have an exires-at header that matches the x-if-delete-at header sent by the expirer. So the result is t_expire.ts on node A and t1.ts on node B. The next time the replicator runs this anomaly will be corrected and both nodes will end up with t_expires.ts. However, apart from the fact that a replication process should not actually generate inconsistent state, the anomaly has undesirable side-effects: 1. The expirer DELETE to node B fails and is therefore retried (by default 3 times) 2. Because the expirer DELETE to node B fails, some container db listings are not updated with the delete, so container listing remain inconsistent after expiration. This can all be seen to play out with test/probe/test_object_expirer.py:TestObjectExpirer.test_expirer_delete_returns_outdated_412 which fails if replication method is ssync but passes with rsync The solution is likely to be for the ssync sender to be more discriminating when it opens a diskfile and gets a DiskFileDeleted exception, here: https://github.com/openstack/swift/blob/3218f8b064e462d901466b04a4813e15ec96da85/swift/obj/ssync_sender.py#L349-L351 When the exception is DiskFileExpired, the sender should probably attempt send_put/post rather than send_delete. With replication method = ssync: When ssync sender on node A syncs an expired object t0.data which has an *expired* delete-at header value of t_expire it sends a DELETE subrequest which generates a tombstone on the receiver node B at t0. So after sync we have t0.data on sender node A and t0.ts on receiver node B. That's not good. When the expirer runs and tries to delete the expired object, the expirer's DELETE to node A succeeds and node A gets t_expire.ts. The expirer's DELETE to node B fails with 412 because the tombstone t0.ts on node B does not have an exires-at header that matches the x-if-delete-at header sent by the expirer. So the result is t_expire.ts on node A and t1.ts on node B. The next time the replicator runs this anomaly will be corrected and both nodes will end up with t_expires.ts. However, apart from the fact that a replication process should not actually generate inconsistent state, the anomaly has undesirable side-effects: 1. The expirer DELETE to node B fails and is therefore retried (by default 3 times) 2. Because the expirer DELETE to node B fails, some container db listings are not updated with the delete, so container listing remain inconsistent after expiration. (Until https://review.openstack.org/416384 was merged, this could all be seen to play out with test/probe/test_object_expirer.py:TestObjectExpirer.test_expirer_delete_returns_outdated_412 which failed if replication method is ssync but passes with rsync) The solution is likely to be for the ssync sender to be more discriminating when it opens a diskfile and gets a DiskFileDeleted exception, here: https://github.com/openstack/swift/blob/3218f8b064e462d901466b04a4813e15ec96da85/swift/obj/ssync_sender.py#L349-L351 When the exception is DiskFileExpired, the sender should probably attempt send_put/post rather than send_delete.
2017-06-28 07:54:20 Alistair Coles swift: importance Medium High
2017-08-17 22:37:39 OpenStack Infra swift: status Confirmed Fix Released
2018-01-19 13:29:53 OpenStack Infra tags in-feature-s3api
2018-01-22 22:23:32 OpenStack Infra tags in-feature-s3api in-feature-deep in-feature-s3api