object replicator update_deleted post ssync REPLICATE request considered harmful
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
To summarize ongoing work on rebalance improvements [1] REPLICATE requests have a very course API, they are ostensibly designed to speed up replication, but are known to cause a significant amount of IO contention and can be slow UNDER SOME CIRCUMSTANCES.
One such circumstance is when using write-affinity and SSYNC
Often the "handoff part" in a write affinity cluster will be very very sparse (a single object?) compared to the remote partition which may be very very dense (consider LOSF). Currently the replicator update_deleted and reconstructor revert methods both fire a REPLICATE request after syncing a partition to a remote primary to cause an immediate invalidation and recalculation on all synced suffixes. When using SSYNC this is not necessary because any updated suffixes are invalidated inline while syncing objects.
With a write affinity cluster using rsync replication the post REPLICATE request is unavoidable because rsync will ship new objects "underneath" the current suffix hashes.pkl - we must at a MINIMUM invalidate the suffixes that have been synced [2].
However when a write affinity cluster is using SSYNC the post REPLICATE request is an unfavorable IO trade-off often taking more IO than the object transfer with less granular concurrency control to shape the IO budget.
Since the reconstructor can never use rsync we should remove the post-revert-
When the replicator is using SSYNC we should skip the post-update-
1. https:/
2. It's not obvious at all that we SHOULD only do suffix invalidation when using rsync as one of the valuable "side-effects" of suffix recalculation (despite being expensive) is reaping old data files that are "over-written" but newer tombstones.
This issue was fixed in the openstack/swift 2.27.0 release.