Slow Ring Loading in 2.7 due to Ring Unpickling

Bug #1031954 reported by David Hadas on 2012-08-01
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Darrell Bishop

Bug Description

The time it takes to load or reload each ring was increased by two orders of magnitude between Python 2.6 and 2.7, consuming 100% CPU while (re)loading each ring.
On my (weak) server, loading each ring (with 2^18 partitions) has grown from 0.01s to 3-4s

Immediate Solution (if you bothered by this)
The problem exist only for Rings created in Python 2.7.
As an immediate solution, python 2.6 can be used when producing or updating rings (even if the deployment is with Python 2.7)
Rings produced in Python 2.6 load considerably faster on python 2.7.
This intermediate solution was tested with an older version of Swift and need to be retested with the latest version.

Long term Solution
It is better for swift to avoid pickling the Rings.
Pickling is both insecure and not needed in this case.
Serialization of the Rings can be done with alternative serialization codes - I had a quick look at json but it seems that most json implementations do not support array.array - need to further investigate.

Another option which seems like the better one all around is to create a dedicated ring serialization code.
This will result in the fastest implementation.

What is causing the slowdown
The serialization of array.array(), which is implemented in _reduce__(), within cpython, had changed between python 2.6 and python 2.7
     Python 2.6 used a single BINSTRING to express the entire array, while Python 2.7 uses a BININT per each element.
     So processing of a single long element of BINSTRING during pickle.load() or pickle.loads() is replaced with 2^18 elements being processed.
     The change was apparently made in order to align 2.7 with 3.*

Changed in swift:
importance: Undecided → High
Darrell Bishop (darrellb) wrote :

I benchmarked some alternatives:

Personally, I prefer the msgpack implementation which serializes the array.array data as a string. It deserializes much faster than when serialized as a list, is just as architecture-dependent as Python 2.6 rings were, and is simpler than the "custom" code. The only downside I can see would be an added dependency on msgpack-python for Swift.

Mike Barton (redbo) wrote :

I like msgpack, but I'm not all about adding a new dependency to speed up a rare operation that only takes a few ms anyway.

gholt (gholt) wrote :

The "ms" is confusing. Usually that means milliseconds but I'm pretty sure by context that's not what it means here. Does it mean minutes?

Mike Barton (redbo) wrote :


Darrell Bishop (darrellb) wrote :

On my (new) Macbook Pro (specs in that gist), a Python 2.7-pickled part-power=18 ring took ~3271ms, while the same ring pickled by Python 2.6 deserializes in ~6ms. That's between 2 and 3 orders of magnitude (545x).

You should be able to run the same benchmark script (dependencies include having a ring in the current directory named object.ring.gz and the benchmark module installed) with a RAX-production-cluster-size ring and see what that looks like.

Darrell Bishop (darrellb) wrote :

Also, I think the choice isn't "do we fix this or not", but "what new (de)serialization scheme will we use". I think JSON would be a horrible choice; I'll just get that out there now.

I see two main types of schemes: some external module or custom code w/no new dependencies. They both have plusses and minuses. I think something like msgpack will be more efficient than comparably-flexible pure-python code (my "custom" deserialization is NOT robust to deserializing a different on-disk format, for instance). With the custom encoding, you'd have to version the on-disk format and worry about backwards compatibility for *deserialization*. I don't think pickle or msgpack have to worry about that (doing the right thing with what you deserialized is unavoidable and not directly relevant to the serialization format).

Note that most external tools for (de)serialization (eg. protocol buffers) will probably not handle array.array data natively (msgpack-python explicitly refuses to try). On the other hand, if an external tool did support it, there's a chance the tool would make a different portability/performance trade-off than we would like. You can see one such difference in the two ways I encoded the array.array data for the two msgpack-based implementations. Converting each array.array to a list and letting msgpack encode that, vs converting each array.array to a string and encoding/decoding it without any Unicode encoding has a deserialization performance difference of 10x! One is architecture independent (the slower, list-based one) and one is equally-architecture dependent as the Python 2.6 pickling (it's basically a memory dump following the byte-ordering of the architecture).

Anyway, just some more food for thought on the issue :)

description: updated
David Hadas (david-hadas) wrote :

I think Mike was confused by a typo in my original note (I indicated a change was from 0.01ms to 3-4ms, while I meant seconds rather than ms... :)

I am not sure about the value that msgpack brings to the table in this case.

Apparently there are two parts with different characteristics to serialize:
     a. The device table and any other parameter we may save as part of teh ring apart from the arrays
     b. The very long arrays
So maybe we should treat the two differently.

As for a. any serialization will do - could be json or any other, or we could write our own. It is going to be fast anyhow. And we must use versioning as as Darrel indicated in case there is a need to extend the ring data one day.
As for b. It may be appropriate to write a very simple serialization code dedicated for the task. E.g. write each entry as an unsigned short (assuming the max number of allowed devices is limited). This can be written for example right after the the serialized representation of a.

Darrell Bishop (darrellb) wrote :

The value that msgpack brings is apparent if you compare the two pairs of msgpack-based (de)serialization routines in my gist with the "custom" encoder/decoder. The msgpack-based implementations are only a few lines of simple code each, one of them is just as fast as a more complicated/rigid custom implementation, and the msgpack (de)serialization code is robust to changing underlying data (in contrast with the "custom" implementation).

BTW, I think if you deserialize the array.array data by reading each entry as an unsigned short and sticking it into the new array.array, you'll have performance on par with the slower msgpack-based solution (~10x worse than the one which maintains the array.array data as it sits in memory but still *way* faster than Python 2.7's pickle.load). I think you have to dump the data with .tostring() and load it using that (which works fine through the constructor).

I'll cook up a hybrid JSON/custom encoder/decoder, see how it looks and performs, and update the gist.

Darrell Bishop (darrellb) wrote :

Here are my results with a hybrid JSON/struct approach included:

As you predicted, the JSON/struct hybrid performs extremely well. Here's the code in case you're having trouble viewing the gist:

def json_hybrid_serialize_ring(ring, filename):
    gz = GzipFile(filename, 'wb', compresslevel=GZ_LEVEL)
    json_text = json.dumps([ring['part_shift'], ring['devs']])
    json_len = len(json_text)
    gz.write(struct.pack('!I%ds' % json_len,
                         json_len, json_text))
    gz.write(struct.pack('!H', len(ring['replica2part2dev_id'])))
    for part2dev_id in ring['replica2part2dev_id']:
        part_count = len(part2dev_id)
            '!II%ds' % (part_count * part2dev_id.itemsize,),
            part_count, part2dev_id.itemsize, part2dev_id.tostring()))

def json_hybrid_deserialize_ring(filename):
    gz = GzipFile(filename)
    ring_dict = {
        'replica2part2dev_id': [],
    json_len, = struct.unpack('!I',
    ring_dict['part_shift'], ring_dict['devs'] = json.loads(
    replica_count, = struct.unpack('!H',
    for _ in range(replica_count):
        part_count, part_size = struct.unpack('!II',
            array.array('H', * part_size)))
    return ring_dict

The on-disk structure is:

Mike Barton (redbo) wrote :

I've always fancied having a version of the ring where all of the processes would share memory. Here's a sort of dirty stab at one.

Fix proposed to branch: master

Changed in swift:
assignee: nobody → Darrell Bishop (darrellb)
status: New → In Progress

Submitter: Jenkins
Branch: master

commit f8ce43a21891ae2cc00d0770895b556eea9c7845
Author: Darrell Bishop <email address hidden>
Date: Sun Aug 5 00:51:49 2012 -0700

    Use custom encoding for RingData, not pickle.

    Serialize RingData in a versioned, custom format which is a combination
    of a JSON-encoded header and .tostring() dumps of the
    replica2part2dev_id arrays. This format deserializes hundreds of times
    faster than rings serialized with Python 2.7's pickle (a significant
    performance regression for ring loading between Python 2.6 and Python
    2.7). Fixes bug 1031954.

    swift.common.ring.ring.RingData is now responsible for serialization and
    deserialization of its data via a new load() class method and save()
    object method. The new implementation is backward-compatible; if a ring
    does not begin with a new-style magic string, it is assumed to be an
    old-style pickle-dumped ring and is handled as before. So new Swift
    code can read old rings, but old Swift code will not be able to read

    I didn't want to bite of more than necessary, so I didn't mess with
    builder file serialization.

    Change-Id: I799b9a4c894d54fb16592443904ac055b2638e2d

Changed in swift:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-09-04
Changed in swift:
milestone: none → 1.7.0
Thierry Carrez (ttx) on 2012-09-04
Changed in swift:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers