Swift rebalance is extremely slow in Ubuntu

Bug #1261659 reported by Vladimir Kuklin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Nikolay Markov

Bug Description

On Ubuntu + Python 2.7 swift rebalance takes up to 8 minutes to complete, instead of 1-2.

/bin/sh -c swift-ring-builder /etc/swift/container.builder rebalance

Looking through recent changes, I found that complexity and time spent on ring rebalancing was seriously increased with calling sorted() on each iteration through tiers depths

Profiling results:

   Ordered by: cumulative time

   ncalls tottime percall cumtime percall filename:lineno(function)
  3145728 10.013 0.000 13.212 0.000 {sorted}
   786432 4.811 0.000 8.633 0.000 /usr/lib/python2.7/dist-packages/swift/common/ring/builder.py:906(_sort_key_for)
 12582912 6.418 0.000 6.418 0.000 {_bisect.bisect_left}
 12582912 4.933 0.000 4.933 0.000 {method 'pop' of 'list' objects}
 12582912 4.748 0.000 4.748 0.000 {method 'insert' of 'list' objects}

As I see this was made to increase quality of produced results, but performance is still bad.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Nikolay Markov (nmarkov)
Changed in fuel:
assignee: nobody → Nikolay Markov (nmarkov)
status: New → In Progress
Changed in fuel:
assignee: Nikolay Markov (nmarkov) → Bogdan Dobrelya (bogdando)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I can confirm there are issues in swift code /swift/common/ring/builder.py which is called from ./bin/swift-ring-builder, causing timeouts on rebalance

Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → nobody
Changed in fuel:
assignee: nobody → Nikolay Markov (nmarkov)
importance: High → Critical
Changed in fuel:
assignee: Nikolay Markov (nmarkov) → Nikolay Korshenin (nkorshenin)
assignee: Nikolay Korshenin (nkorshenin) → Nikolay Markov (nmarkov)
Revision history for this message
Nikolay Markov (nmarkov) wrote :

Created bug in Swift project: https://bugs.launchpad.net/swift/+bug/1262166

description: updated
no longer affects: swift
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Steps to reproduce:

install swift package from 1.10 tag

run:

rm /etc/swift/container.builder
swift-ring-builder /etc/swift/container.builder create 18 3 1
/usr/bin/swift-ring-builder /etc/swift/container.builder add z3-10.108.30.4:6001/2 1
/usr/bin/swift-ring-builder /etc/swift/container.builder add z3-10.108.30.4:6001/1 1
/usr/bin/swift-ring-builder /etc/swift/container.builder add z1-10.108.30.2:6001/2 1
/usr/bin/swift-ring-builder /etc/swift/container.builder add z1-10.108.30.2:6001/1 1
/usr/bin/swift-ring-builder /etc/swift/container.builder add z2-10.108.30.3:6001/2 1
/usr/bin/swift-ring-builder /etc/swift/container.builder add z2-10.108.30.3:6001/1 1
swift-ring-builder /etc/swift/container.builder rebalance

Revision history for this message
clayg (clay-gerrard) wrote :

If you look at the review for the change, we were fixing a bug in balancing and there was a speed cost:

https://review.openstack.org/#/c/41802/

We're always looking for ways to make balancing faster, but... *why* is this a critical "bug"?

Revision history for this message
Nikolay Markov (nmarkov) wrote :

clayg, we're launching it through Puppet with a bunch of processes running simultaneously on the same machine, and it begins to eat CPU and takes much more time than expected. This doesn't look like a right behaviour, and this leads to errors and timeouts in our tool.

Revision history for this message
Oleg S. Gelbukh (gelbuhos) wrote :

Nikolay, potentially you could reduce the default partition power of the cluster from 18 to something based on expected number of devices in your cluster? For example, if you only going to have 1-2 devices per node, and only 3 nodes, you might have partition power of 10-12 and be just fine. I suspect that would reduce the time to build your rings significantly, wouldn't it?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/63392

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

revert of havana's rebalance commit improvement decreased rebalance time from 8 minutes to 5,5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/63392
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=79002d33327780039f481b78fe7a5d6e70e645c0
Submitter: Jenkins
Branch: master

commit 79002d33327780039f481b78fe7a5d6e70e645c0
Author: Vladimir Kuklin <email address hidden>
Date: Fri Dec 20 16:19:29 2013 +0400

    Calculate partition power for swift

    Calculage partition power according to
    https://answers.launchpad.net/swift/+question/211929

    power = int(log2(devnumber*100)) + resize_factor

    resize_factor can be set in $::fuel_settings['swift'] hash.
    default value: 2 (2048 partitions in standard setup)

    Change-Id: Icbc518b05f8170b8656b6aad701bf3cece89f74f
    Related-bug: #1261659

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

This bug can be closed as soon as related fix is commited, which calculates amount rings according to the number of inital devices and resize_value provided by user. Bug related to rebalance optimization was created in swift project. This bug is not blocking release anymore.

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.