OpenStack Object Storage (swift)

Reconstructor jobs are ordered by disk instead of randomized

Bug #1491605 reported by Caleb Tennis on 2015-09-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	High	Unassigned

Bug Description

We create jobs by disk and then run them (vs. essentially randomizing jobs by the replicator). The problem with this is obviously a slow disk can basically bog down the reconstructor and it never progresses. And imagine a scenario where it restarts every so often (hour?) due to ring changes, so it never progresses past the one disk its working on.

Tags:

John Dickinson (notmyname) on 2015-09-09

Changed in swift:
importance:	Undecided → Low

John Dickinson (notmyname) on 2015-09-09

Changed in swift:
status:	New → Confirmed

Paul Dardeau (paul-dardeau) on 2015-10-22

Changed in swift:
assignee:	nobody → Paul Dardeau (paul-dardeau)

Paul Dardeau (paul-dardeau) on 2015-10-22

Changed in swift:
assignee:	Paul Dardeau (paul-dardeau) → nobody

Revision history for this message

clayg (clay-gerrard) wrote on 2016-12-13:

I think we can fix this in collect_parts.

The trick is to do the quick `for policy; for part in listdir` loop to append to a list and shuffle it - then iterate over the shuffled list do all the hard work.

In particular we want to avoid doing the isdir check on ever partition directory before we start yielding out part_info dicts [1].

Ideally tho once the part_info dicts start coming out - they would be splayed randomly across policies and most importantly devices.

1. https://review.openstack.org/#/c/140178/

Changed in swift:
importance:	Low → High

Revision history for this message

clayg (clay-gerrard) wrote on 2016-12-28:

also related to lp bug #1653018

Revision history for this message

clayg (clay-gerrard) wrote on 2017-01-26:

fix is here: https://review.openstack.org/#/c/425468/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-01: Fix merged to swift (master)

Reviewed: https://review.openstack.org/425468
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2f0ab78f9ff85483a157c9cbb17b50eeff539ef9
Submitter: Jenkins
Branch: master

commit 2f0ab78f9ff85483a157c9cbb17b50eeff539ef9
Author: Clay Gerrard <email address hidden>
Date: Wed Jan 25 11:45:55 2017 -0800

Shuffle disks and parts in reconstructor

    The main problem with going disk by disk is that it means all of your
    I/O is only on one spindle at a time and no matter how high you set
    concurrency it doesn't go any faster.

Closes-Bug: #1491605

Change-Id: I69e4c4baee64fd2192cbf5836b0803db1cc71705