OpenStack Object Storage (swift)

RSYNC: Probable race condition in replication/reconstruction can lead to loss of datafile

Bug #1903917 reported by Romain LE DISEZ on 2020-11-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Confirmed	High	Unassigned

Bug Description

Related bug: 1897177

During a rebalance, it was found a reproducible scenario with SSYNC for both replicator and reconstructor. That scenario leads to datafile loss (see bug 1897177). As the workflow is similar, it is higly probable the same bug applies to RSYNC replication.

The scenario is as follow (revert means replicate then delete a handoff partition):
  1. A gets and reloads the new ring
  2. A starts to revert the partition P to node B
  3. B (with the old ring) starts to revert the (partial) partition P to node A
     => replication should be fast as all objects are already on node A
  4. B finished replication of (partial) partition P to node A
  5. B remove the (partial) partition P after revert succeeded
  6. A finishes revert of partition P to node B
  7. A removes the partition P
  8. B gets and reloads the new ring

Revision history for this message

Tim Burke (1-tim-z) wrote on 2020-11-12:

The repro is a little more involved than the reconstructor case, but yeah, this can definitely happen. First up, I hacked up handoffs_first to be handoffs_only

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index dcab26fe1..a8891124c 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -917,7 +918,8 @@ class ObjectReplicator(Daemon):
         random.shuffle(jobs)
         if self.handoffs_first:
             # Move the handoff parts to the front of the list
- jobs.sort(key=lambda job: not job['delete'])
+ jobs = [job for job in jobs if job['delete']]
         self.job_count = len(jobs)
         return jobs

Then widened the race:

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index dcab26fe1..a8891124c 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -587,6 +587,7 @@ class ObjectReplicator(Daemon):
self.logger.timing_since('partition.delete.timing', begin)

     def delete_partition(self, path):
+ time.sleep(10)
         self.logger.info(_("Removing partition: %s"), path)
         try:
             tpool.execute(shutil.rmtree, path)

Letting the replicators run for a bit, I'm down to only two data files.

Changed in swift:
status:	New → Confirmed
importance:	Undecided → High

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.