RSYNC: Probable race condition in replication/reconstruction can lead to loss of datafile

Bug #1903917 reported by Romain LE DISEZ
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
High
Unassigned

Bug Description

Related bug: 1897177

During a rebalance, it was found a reproducible scenario with SSYNC for both replicator and reconstructor. That scenario leads to datafile loss (see bug 1897177). As the workflow is similar, it is higly probable the same bug applies to RSYNC replication.

The scenario is as follow (revert means replicate then delete a handoff partition):
  1. A gets and reloads the new ring
  2. A starts to revert the partition P to node B
  3. B (with the old ring) starts to revert the (partial) partition P to node A
     => replication should be fast as all objects are already on node A
  4. B finished replication of (partial) partition P to node A
  5. B remove the (partial) partition P after revert succeeded
  6. A finishes revert of partition P to node B
  7. A removes the partition P
  8. B gets and reloads the new ring

Revision history for this message
Tim Burke (1-tim-z) wrote :

The repro is a little more involved than the reconstructor case, but yeah, this can definitely happen. First up, I hacked up handoffs_first to be handoffs_only

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index dcab26fe1..a8891124c 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -917,7 +918,8 @@ class ObjectReplicator(Daemon):
         random.shuffle(jobs)
         if self.handoffs_first:
             # Move the handoff parts to the front of the list
- jobs.sort(key=lambda job: not job['delete'])
+ jobs = [job for job in jobs if job['delete']]
         self.job_count = len(jobs)
         return jobs

Then widened the race:

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index dcab26fe1..a8891124c 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -587,6 +587,7 @@ class ObjectReplicator(Daemon):
             self.logger.timing_since('partition.delete.timing', begin)

     def delete_partition(self, path):
+ time.sleep(10)
         self.logger.info(_("Removing partition: %s"), path)
         try:
             tpool.execute(shutil.rmtree, path)

Letting the replicators run for a bit, I'm down to only two data files.

Changed in swift:
status: New → Confirmed
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.