Comment 1 for bug 1903917

Revision history for this message
Tim Burke (1-tim-z) wrote :

The repro is a little more involved than the reconstructor case, but yeah, this can definitely happen. First up, I hacked up handoffs_first to be handoffs_only

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index dcab26fe1..a8891124c 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -917,7 +918,8 @@ class ObjectReplicator(Daemon):
         random.shuffle(jobs)
         if self.handoffs_first:
             # Move the handoff parts to the front of the list
- jobs.sort(key=lambda job: not job['delete'])
+ jobs = [job for job in jobs if job['delete']]
         self.job_count = len(jobs)
         return jobs

Then widened the race:

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index dcab26fe1..a8891124c 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -587,6 +587,7 @@ class ObjectReplicator(Daemon):
             self.logger.timing_since('partition.delete.timing', begin)

     def delete_partition(self, path):
+ time.sleep(10)
         self.logger.info(_("Removing partition: %s"), path)
         try:
             tpool.execute(shutil.rmtree, path)

Letting the replicators run for a bit, I'm down to only two data files.