repeating sorting is decreasing performance on rebalancing

Bug #1262166 reported by Nikolay Markov on 2013-12-18
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Undecided
Nikolay Markov

Bug Description

On Ubuntu + Python 2.7 swift rebalance takes up to 8 minutes to complete, instead of 1-2.

/bin/sh -c swift-ring-builder /etc/swift/container.builder rebalance

Looking through recent changes, I found that complexity and time spent on ring rebalancing was seriously increased with calling sorted() on each iteration through tiers depths

Nikolay Markov (nmarkov) on 2013-12-18
Changed in swift:
assignee: nobody → Nikolay Markov (nmarkov)
description: updated
Nikolay Markov (nmarkov) on 2013-12-18
description: updated
Samuel Merritt (torgomatic) wrote :

It did get quite a bit slower, but now it produces optimally-balanced rings in one try instead of needing many repeated rebalancings, so overall it's probably a win. See commit d3e2cb35db5c9d90c35ec6440a9ffd18f2d33aa4 for more details.

That said, there is some precedent for avoiding sort() calls in the ring builder. There are other places where things are kept in order from least-to-most-desirable, then the most-desirable one is removed with pop(), modified, and reinserted appropriately (using a binary search). That's certainly preferring speed over simplicity.

To see examples of this, look for calls to bisect.bisect_left() in the ring builder code.

Nikolay Markov (nmarkov) wrote :

Maybe it would be also appropriate to use heapq here?

Roman Podoliaka (rpodolyaka) wrote :

"...the most-desirable one is removed with pop(), modified, and reinserted appropriately (using a binary search). That's certainly preferring speed over simplicity..." agree with Nikolay here, sounds like a good task for a binary heap (inserts will be O(lg(N)) and no need to sort again)

Samuel Merritt (torgomatic) wrote :

Yeah, but heapq pops off the front of the list, and that's a fair bit slower than popping off the back of the list. See https://gist.github.com/smerritt/8045982 for a quick benchmark script.

That said, if heapq winds up being faster than what we're doing, go for it. I'm just pointing out that it might not win.

Reviewed: https://review.openstack.org/63315
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=34fa05f66ba79defe7bcdedfbb87556e21b25e08
Submitter: Jenkins
Branch: master

commit 34fa05f66ba79defe7bcdedfbb87556e21b25e08
Author: Yuriy Taraday <email address hidden>
Date: Fri Dec 20 08:57:43 2013 +0400

    Various optimizations to RingBuilder.rebalance()

    Overall gain is about 20-22% of time on my laptop. This includes:

    * replacing string sort_key with a tuple (because we can and because
      compairing two 3-tuples is faster than comparing two 26-byte strings);
    * keeping track of hungriest dev in tier (allows us to use built-in
      dict.__getitem__ as a key instead of lambdas in couple of places);
    * remove unnecessary sorted() call in the innermost loop (because we
      don't need to sort all of them or better we don't need to sort
      anything there);
    * memoize tiers for each dev (saves just a couple of percents but why
      not).

    Performance measurments have been done using this script:
    http://paste.openstack.org/show/55609/

    Related-Bug: #1262166
    Related-Bug: #1261659
    Change-Id: If38bd9fe82efc12b01e9aa146e0f4ab565fb6bea

Download full text (23.8 KiB)

Reviewed: https://review.openstack.org/66462
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3895441afd1f8ca49a09a483f402a961009a8661
Submitter: Jenkins
Branch: feature/ec

commit bad52f11218a11978d1efb0832f164a60a363cc2
Author: Clay Gerrard <email address hidden>
Date: Fri Jan 10 00:31:55 2014 -0800

    Allow programmatic reloading of Swift hash path config

    New util's function validate_hash_conf allows you to programmatically reload
    swift.conf and hash path global vars HASH_PATH_SUFFIX and HASH_PATH_PREFIX
    when they are invalid.

    When you load swift.common.utils before you have a swift.conf there's no good
    way to force a re-read of swift.conf and repopulate the hash path config
    options - short of restarting the process or reloading the module - both of
    which are hard to unittest. This should be no worse in general and in some
    cases easier.

    Change-Id: I1ff22c5647f127f65589762b3026f82c9f9401c1

commit 7b9c283203479cb9916951e1ce1f466f197dea36
Author: Samuel Merritt <email address hidden>
Date: Fri Jan 10 12:57:53 2014 -0800

    Add missing license header to test file

    All the other tests have license headers, so this one should too.

    I picked 2013 for the copyright year because that's when "git log"
    says it was first and last touched.

    Change-Id: Idd41a179322a3383f6992e72d8ba3ecaabd05c47

commit 47fcc5fca2c5020b69f3c2c7f0a8032f6c77354a
Author: Christian Schwede <email address hidden>
Date: Fri Jan 10 07:14:43 2014 +0000

    Update account quota doc

    A note was added stating that the same limitations apply to
    account quotas as for container quotas. An example on uploads
    without a content-length headers was added.

    Related-Bug: 1267659
    Change-Id: Ic29b527cb71bf5903c2823844a1cf685ab6813dd

commit 6426f762d0d87063f9813630c620d880a4191046
Author: Peter Portante <email address hidden>
Date: Mon Dec 9 20:52:58 2013 -0500

    Raise diskfile.py module coverage to > 98%

    We attempt to get the code coverage (with branch coverage) to 100%,
    but fall short because due to interactions between coverage.py and
    CPython's peephole optimizer. See:

        https://bitbucket.org/ned/coveragepy/issue/198/continue-marked-as-not-covered

    In the main diskfile module, we remove the check for a valid
    "self._tmppath" since it is only one of a number of fields that could
    be verified and it was not worth trying to get coverage for it. We
    also remove the try / except around the close() method call in the
    DiskFileReader's app_iter_ranges() method since it will never be
    called in a context that will raise a quarantine exception (by
    definition ranges can't generate a quarantine event).

    We also:

    * fix where quarantine messages are checked to ensure the
      generator is actually executed before the check
    * in new and modified tests:
      * use assertTrue in place of assert_
      * use assertEqual in place of assertEquals
    * fix references to the reserved word "object"

    Change-Id: I6379be04adfc5012cb0b91748fb3ba3f11200b48

commit 5196eae...

Changed in swift:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers