object replicator collect_jobs loops over all replicas before each run

Bug #1395879 reported by Caleb Tennis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

The object replicator collect_jobs loops over all devices, then does an os.listdir on each one. Imagine a node where replication is slow because there is a lot of IO. You can get in a cycle where there are so many partitions to look through, it takes forever for the replicator to start. I found that to load a objects directory with 50,000 partitions in it takes about 6-7 minutes, and that's just a single device.

I don't know what the right fix here is, maybe something more incremental. In addition, specifying -d or -p to limit the scope of the replicator doesn't bypass this step - it still loads all partitions first and THEN only works on the subset specified.

Revision history for this message
clayg (clay-gerrard) wrote :
Revision history for this message
Caleb Tennis (ctennis) wrote :

I think https://review.openstack.org/#/c/149384/ handles fixing this, so once that's merged this can be closed.

Revision history for this message
Samuel Merritt (torgomatic) wrote :

that thing was merged

Changed in swift:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.