Extremely slow when performing a swift-ring-builder rebalance

Bug #1473899 reported by Yao Long
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Won't Fix
Undecided
Unassigned

Bug Description

I'm initializing a swift environment. I 'm trying to re-balance a 2^24 ring and I'm seeing there is only one CPU core is 100% occupied doing the job with others barely running at all and it seems that it's not able to finish the job. Am i missing any configurations?

Yao Long (yao.long)
description: updated
Revision history for this message
Samuel Merritt (torgomatic) wrote :

swift-ring-builder is a single-threaded process, so it will not use multiple cores.

Also, please define "not able to finish the job". How long does the process take to finish? If you let it run for an hour, or overnight, does the process exit successfully?

Changed in swift:
status: New → Incomplete
Revision history for this message
Yao Long (yao.long) wrote :

Yeah, it finished after about 58 minutes, taking much longer than I'd expected.

Revision history for this message
Samuel Merritt (torgomatic) wrote :

There are some possible performance improvements on master (commit 2328983, for example) that might help, depending on the particulars of your ring. However, a part-power of 24 is, frankly, enormous. A rebalance typically takes a minute or two for a part-power of 18; scaling that up to a part-power of 24 means I'd expect anywhere from 1 to 2 hours per rebalance.

Revision history for this message
John Dickinson (notmyname) wrote :

A part power of 24 would be sufficient for over 500000 drives in the cluster. Using 6TB drives, and assuming 3 replicas and a cluster 80% full, you end up with a 750PB cluster. If you have a cluster that is significantly smaller than 500000 drives, you'll end up with a lot of local filesystem overhead on each drive as Swift manages the placement of data. This will slow down auditing and replication.

Also, Swift only can manage 65535 drives today (~100PB with 6Tb volumes). Although this device id limit would be trivial to update, the point I'm trying to make is that I strongly doubt you should be using a part power of 24. See http://docs.openstack.org/developer/swift/deployment_guide.html#preparing-the-ring for more guidance.

Revision history for this message
Samuel Merritt (torgomatic) wrote :

This is one of those times I wish Launchpad had a resolution like "Unfortunate" or "Such Is Life".

I'd like things to be faster, but rebalancing 2^24 partitions is always going to take a significant amount of time.

Changed in swift:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.