OpenStack Object Storage (swift)

Slow Ring Loading in 2.7 due to Ring Unpickling

Bug #1031954 reported by David Hadas on 2012-08-01

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	High	Darrell Bishop	OpenStack Object Storage (swift) 1.7.0

Bug Description

The time it takes to load or reload each ring was increased by two orders of magnitude between Python 2.6 and 2.7, consuming 100% CPU while (re)loading each ring.
On my (weak) server, loading each ring (with 2^18 partitions) has grown from 0.01s to 3-4s

Immediate Solution (if you bothered by this)
The problem exist only for Rings created in Python 2.7.
As an immediate solution, python 2.6 can be used when producing or updating rings (even if the deployment is with Python 2.7)
Rings produced in Python 2.6 load considerably faster on python 2.7.
This intermediate solution was tested with an older version of Swift and need to be retested with the latest version.

Long term Solution
It is better for swift to avoid pickling the Rings.
Pickling is both insecure and not needed in this case.
Serialization of the Rings can be done with alternative serialization codes - I had a quick look at json but it seems that most json implementations do not support array.array - need to further investigate.

Another option which seems like the better one all around is to create a dedicated ring serialization code.
This will result in the fastest implementation.

What is causing the slowdown
The serialization of array.array(), which is implemented in _reduce__(), within cpython, had changed between python 2.6 and python 2.7
     Python 2.6 used a single BINSTRING to express the entire array, while Python 2.7 uses a BININT per each element.
     So processing of a single long element of BINSTRING during pickle.load() or pickle.loads() is replaced with 2^18 elements being processed.
     The change was apparently made in order to align 2.7 with 3.*

See original description

John Dickinson (notmyname) on 2012-08-01

Changed in swift:
importance:	Undecided → High

Revision history for this message

Darrell Bishop (darrellb) wrote on 2012-08-03:

I benchmarked some alternatives:
https://gist.github.com/859ba4995a3df9f45913#file_report_2.7.markdown

Personally, I prefer the msgpack implementation which serializes the array.array data as a string. It deserializes much faster than when serialized as a list, is just as architecture-dependent as Python 2.6 rings were, and is simpler than the "custom" code. The only downside I can see would be an added dependency on msgpack-python for Swift.

Revision history for this message

Mike Barton (redbo) wrote on 2012-08-03:

I like msgpack, but I'm not all about adding a new dependency to speed up a rare operation that only takes a few ms anyway.

Revision history for this message

gholt (gholt) wrote on 2012-08-03:

The "ms" is confusing. Usually that means milliseconds but I'm pretty sure by context that's not what it means here. Does it mean minutes?

Revision history for this message

Mike Barton (redbo) wrote on 2012-08-03:

Revision history for this message

Darrell Bishop (darrellb) wrote on 2012-08-03:

On my (new) Macbook Pro (specs in that gist), a Python 2.7-pickled part-power=18 ring took ~3271ms, while the same ring pickled by Python 2.6 deserializes in ~6ms. That's between 2 and 3 orders of magnitude (545x).

You should be able to run the same benchmark script (dependencies include having a ring in the current directory named object.ring.gz and the benchmark module installed) with a RAX-production-cluster-size ring and see what that looks like.

Revision history for this message

Darrell Bishop (darrellb) wrote on 2012-08-03:

Also, I think the choice isn't "do we fix this or not", but "what new (de)serialization scheme will we use". I think JSON would be a horrible choice; I'll just get that out there now.

I see two main types of schemes: some external module or custom code w/no new dependencies. They both have plusses and minuses. I think something like msgpack will be more efficient than comparably-flexible pure-python code (my "custom" deserialization is NOT robust to deserializing a different on-disk format, for instance). With the custom encoding, you'd have to version the on-disk format and worry about backwards compatibility for *deserialization*. I don't think pickle or msgpack have to worry about that (doing the right thing with what you deserialized is unavoidable and not directly relevant to the serialization format).

Note that most external tools for (de)serialization (eg. protocol buffers) will probably not handle array.array data natively (msgpack-python explicitly refuses to try). On the other hand, if an external tool did support it, there's a chance the tool would make a different portability/performance trade-off than we would like. You can see one such difference in the two ways I encoded the array.array data for the two msgpack-based implementations. Converting each array.array to a list and letting msgpack encode that, vs converting each array.array to a string and encoding/decoding it without any Unicode encoding has a deserialization performance difference of 10x! One is architecture independent (the slower, list-based one) and one is equally-architecture dependent as the Python 2.6 pickling (it's basically a memory dump following the byte-ordering of the architecture).

Anyway, just some more food for thought on the issue :)

David Hadas (david-hadas) on 2012-08-03

description:

updated

Revision history for this message

David Hadas (david-hadas) wrote on 2012-08-03:

I think Mike was confused by a typo in my original note (I indicated a change was from 0.01ms to 3-4ms, while I meant seconds rather than ms... :)

I am not sure about the value that msgpack brings to the table in this case.

Apparently there are two parts with different characteristics to serialize:
a. The device table and any other parameter we may save as part of teh ring apart from the arrays
b. The very long arrays
So maybe we should treat the two differently.

As for a. any serialization will do - could be json or any other, or we could write our own. It is going to be fast anyhow. And we must use versioning as as Darrel indicated in case there is a need to extend the ring data one day.
As for b. It may be appropriate to write a very simple serialization code dedicated for the task. E.g. write each entry as an unsigned short (assuming the max number of allowed devices is limited). This can be written for example right after the the serialized representation of a.

Revision history for this message

Darrell Bishop (darrellb) wrote on 2012-08-04:

David,
The value that msgpack brings is apparent if you compare the two pairs of msgpack-based (de)serialization routines in my gist with the "custom" encoder/decoder. The msgpack-based implementations are only a few lines of simple code each, one of them is just as fast as a more complicated/rigid custom implementation, and the msgpack (de)serialization code is robust to changing underlying data (in contrast with the "custom" implementation).

BTW, I think if you deserialize the array.array data by reading each entry as an unsigned short and sticking it into the new array.array, you'll have performance on par with the slower msgpack-based solution (~10x worse than the one which maintains the array.array data as it sits in memory but still *way* faster than Python 2.7's pickle.load). I think you have to dump the data with .tostring() and load it using that (which works fine through the constructor).

I'll cook up a hybrid JSON/custom encoder/decoder, see how it looks and performs, and update the gist.

Revision history for this message

Darrell Bishop (darrellb) wrote on 2012-08-04:

David,
Here are my results with a hybrid JSON/struct approach included: https://gist.github.com/859ba4995a3df9f45913#file_report_2.7.markdown

As you predicted, the JSON/struct hybrid performs extremely well. Here's the code in case you're having trouble viewing the gist:

def json_hybrid_serialize_ring(ring, filename):
    gz = GzipFile(filename, 'wb', compresslevel=GZ_LEVEL)
    json_text = json.dumps([ring['part_shift'], ring['devs']])
    json_len = len(json_text)
    gz.write(struct.pack('!I%ds' % json_len,
                         json_len, json_text))
    gz.write(struct.pack('!H', len(ring['replica2part2dev_id'])))
    for part2dev_id in ring['replica2part2dev_id']:
        part_count = len(part2dev_id)
        gz.write(struct.pack(
            '!II%ds' % (part_count * part2dev_id.itemsize,),
            part_count, part2dev_id.itemsize, part2dev_id.tostring()))
    gz.close()

def json_hybrid_deserialize_ring(filename):
    gz = GzipFile(filename)
    ring_dict = {
        'replica2part2dev_id': [],
    }
    json_len, = struct.unpack('!I', gz.read(4))
    ring_dict['part_shift'], ring_dict['devs'] = json.loads(gz.read(json_len))
    replica_count, = struct.unpack('!H', gz.read(2))
    for _ in range(replica_count):
        part_count, part_size = struct.unpack('!II', gz.read(8))
        ring_dict['replica2part2dev_id'].append(
            array.array('H', gz.read(part_count * part_size)))
    return ring_dict

The on-disk structure is:
<json_len><json_text><replica_count>[<part_count><part_size><part_data>...]

Revision history for this message

Mike Barton (redbo) wrote on 2012-08-04:

#10

I've always fancied having a version of the ring where all of the processes would share memory. Here's a sort of dirty stab at one.

https://gist.github.com/28d39fc15a2eb24c1bad

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-05: Fix proposed to swift (master)

#11

Fix proposed to branch: master
Review: https://review.openstack.org/10846

Changed in swift:
assignee:	nobody → Darrell Bishop (darrellb)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-13: Fix merged to swift (master)

#12

Reviewed: https://review.openstack.org/10846
Committed: http://github.com/openstack/swift/commit/f8ce43a21891ae2cc00d0770895b556eea9c7845
Submitter: Jenkins
Branch: master

commit f8ce43a21891ae2cc00d0770895b556eea9c7845
Author: Darrell Bishop <email address hidden>
Date: Sun Aug 5 00:51:49 2012 -0700

Use custom encoding for RingData, not pickle.

    Serialize RingData in a versioned, custom format which is a combination
    of a JSON-encoded header and .tostring() dumps of the
    replica2part2dev_id arrays. This format deserializes hundreds of times
    faster than rings serialized with Python 2.7's pickle (a significant
    performance regression for ring loading between Python 2.6 and Python
    2.7). Fixes bug 1031954.

    swift.common.ring.ring.RingData is now responsible for serialization and
    deserialization of its data via a new load() class method and save()
    object method. The new implementation is backward-compatible; if a ring
    does not begin with a new-style magic string, it is assumed to be an
    old-style pickle-dumped ring and is handled as before. So new Swift
    code can read old rings, but old Swift code will not be able to read
    newly-serialized rings. THIS SHOULD BE MENTIONED PROMINENTLY IN THE
    RELEASE NOTES.

I didn't want to bite of more than necessary, so I didn't mess with
builder file serialization.

Change-Id: I799b9a4c894d54fb16592443904ac055b2638e2d

Changed in swift:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2012-09-04

Changed in swift:
milestone:	none → 1.7.0

Thierry Carrez (ttx) on 2012-09-04

Changed in swift:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.