Reconstructor jobs are ordered by disk instead of randomized

Bug #1491605 reported by Caleb Tennis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Unassigned

Bug Description

We create jobs by disk and then run them (vs. essentially randomizing jobs by the replicator). The problem with this is obviously a slow disk can basically bog down the reconstructor and it never progresses. And imagine a scenario where it restarts every so often (hour?) due to ring changes, so it never progresses past the one disk its working on.

Tags: ec
Changed in swift:
importance: Undecided → Low
Changed in swift:
status: New → Confirmed
Changed in swift:
assignee: nobody → Paul Dardeau (paul-dardeau)
Changed in swift:
assignee: Paul Dardeau (paul-dardeau) → nobody
Revision history for this message
clayg (clay-gerrard) wrote :

I think we can fix this in collect_parts.

The trick is to do the quick `for policy; for part in listdir` loop to append to a list and shuffle it - then iterate over the shuffled list do all the hard work.

In particular we want to avoid doing the isdir check on ever partition directory before we start yielding out part_info dicts [1].

Ideally tho once the part_info dicts start coming out - they would be splayed randomly across policies and most importantly devices.

1. https://review.openstack.org/#/c/140178/

Changed in swift:
importance: Low → High
Revision history for this message
clayg (clay-gerrard) wrote :

also related to lp bug #1653018

Revision history for this message
clayg (clay-gerrard) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/425468
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2f0ab78f9ff85483a157c9cbb17b50eeff539ef9
Submitter: Jenkins
Branch: master

commit 2f0ab78f9ff85483a157c9cbb17b50eeff539ef9
Author: Clay Gerrard <email address hidden>
Date: Wed Jan 25 11:45:55 2017 -0800

    Shuffle disks and parts in reconstructor

    The main problem with going disk by disk is that it means all of your
    I/O is only on one spindle at a time and no matter how high you set
    concurrency it doesn't go any faster.

    Closes-Bug: #1491605

    Change-Id: I69e4c4baee64fd2192cbf5836b0803db1cc71705

Changed in swift:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.13.0

This issue was fixed in the openstack/swift 2.13.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.