OpenStack Object Storage (swift)

object replicator update_deleted post ssync REPLICATE request considered harmful

Bug #1818709 reported by clayg on 2019-03-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	Undecided	Unassigned

Bug Description

To summarize ongoing work on rebalance improvements [1] REPLICATE requests have a very course API, they are ostensibly designed to speed up replication, but are known to cause a significant amount of IO contention and can be slow UNDER SOME CIRCUMSTANCES.

One such circumstance is when using write-affinity and SSYNC

Often the "handoff part" in a write affinity cluster will be very very sparse (a single object?) compared to the remote partition which may be very very dense (consider LOSF). Currently the replicator update_deleted and reconstructor revert methods both fire a REPLICATE request after syncing a partition to a remote primary to cause an immediate invalidation and recalculation on all synced suffixes. When using SSYNC this is not necessary because any updated suffixes are invalidated inline while syncing objects.

With a write affinity cluster using rsync replication the post REPLICATE request is unavoidable because rsync will ship new objects "underneath" the current suffix hashes.pkl - we must at a MINIMUM invalidate the suffixes that have been synced [2].

However when a write affinity cluster is using SSYNC the post REPLICATE request is an unfavorable IO trade-off often taking more IO than the object transfer with less granular concurrency control to shape the IO budget.

Since the reconstructor can never use rsync we should remove the post-revert-rehash-remote call.

When the replicator is using SSYNC we should skip the post-update-deleted-REPLICATE request.

1. https://etherpad.openstack.org/p/swift-rebalance
2. It's not obvious at all that we SHOULD only do suffix invalidation when using rsync as one of the valuable "side-effects" of suffix recalculation (despite being expensive) is reaping old data files that are "over-written" but newer tombstones.