Comment 1 for bug 1260460

Revision history for this message
John Dickinson (notmyname) wrote :

You've run in to an issue that does exist in Swift: containers with high cardinality can increase object write times and may be difficult to keep in sync with its replicas.

There are a few different ways to mitigate this:

1) Use many containers. This is the normal recommended path of "best practices" for using Swift clusters. I'm sorry if you were not able to come across this recommendation earlier, and I'd welcome your help in updating the docs to be clear that this is something to take into consideration when writing an application to store data in Swift.

2) Use faster hardware. As you noticed, the main limitation on container servers is the disk IO. Deploying account and container serves provisioned with SSDs goes a long way to removing bottlenecks in container cardinality. I know this is simply a "throw more hardware at it" solution, but it is a relatively cheap and easy way to massively improve performance, even for containers with 10s of millions of objects in them.

3) Update Swift's code. This is the best long-term solution, but it is by far the most expensive and complicated option. Many attempts to do this have been made and abandoned. Basically, what needs to happen is that a container is sharded across the cluster as it grows. The tricky parts involve preserving ordered listings, knowing when to split and join container segments (while keeping aggregated metadata), and doing all this while keeping the eventual consistency window low enough to keep the system usable.

There's been a placeholder blueprint for this feature for a long time. https://blueprints.launchpad.net/swift/+spec/container-sharding

If you are interested in tackling this problem, we'd all welcome your help. I believe it's something that we in the community must solve eventually. Please join us on freenode IRC in #openstack-swift if you'd like to talk about contributing container sharding into the project.

Finally, I'd like to correct one slight mistake in your report. if possible, Swift does sync containers by sending deltas. Entire container databases are only sent over the network when a replica is completely missing.