Slow Ring Loading in 2.7 due to Ring Unpickling
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
High
|
Darrell Bishop |
Bug Description
The time it takes to load or reload each ring was increased by two orders of magnitude between Python 2.6 and 2.7, consuming 100% CPU while (re)loading each ring.
On my (weak) server, loading each ring (with 2^18 partitions) has grown from 0.01s to 3-4s
Immediate Solution (if you bothered by this)
The problem exist only for Rings created in Python 2.7.
As an immediate solution, python 2.6 can be used when producing or updating rings (even if the deployment is with Python 2.7)
Rings produced in Python 2.6 load considerably faster on python 2.7.
This intermediate solution was tested with an older version of Swift and need to be retested with the latest version.
Long term Solution
It is better for swift to avoid pickling the Rings.
Pickling is both insecure and not needed in this case.
Serialization of the Rings can be done with alternative serialization codes - I had a quick look at json but it seems that most json implementations do not support array.array - need to further investigate.
Another option which seems like the better one all around is to create a dedicated ring serialization code.
This will result in the fastest implementation.
What is causing the slowdown
The serialization of array.array(), which is implemented in _reduce__(), within cpython, had changed between python 2.6 and python 2.7
Python 2.6 used a single BINSTRING to express the entire array, while Python 2.7 uses a BININT per each element.
So processing of a single long element of BINSTRING during pickle.load() or pickle.loads() is replaced with 2^18 elements being processed.
The change was apparently made in order to align 2.7 with 3.*
Changed in swift: | |
importance: | Undecided → High |
description: | updated |
Changed in swift: | |
milestone: | none → 1.7.0 |
Changed in swift: | |
status: | Fix Committed → Fix Released |
I benchmarked some alternatives: /gist.github. com/859ba4995a3 df9f45913# file_report_ 2.7.markdown
https:/
Personally, I prefer the msgpack implementation which serializes the array.array data as a string. It deserializes much faster than when serialized as a list, is just as architecture- dependent as Python 2.6 rings were, and is simpler than the "custom" code. The only downside I can see would be an added dependency on msgpack-python for Swift.