Comment 3 for bug 876810

Revision history for this message
Ilya Murav'jov (muravjov-il) wrote :

I just want to make it clear: the simplejson' maintainers state that JSON may not contain some binary data, even serialized in \uXXXX form (because JSON is text format in Unicode, and i.e. \ud800 is lone surrogate).

Now I for one convert all \udXXX strings to neutral #SdXXX before analyzing the dumps, like so:

surrogate = re.compile(r"(?<!\\)\\u([dD][0-9a-fA-F]{3,3})")
def replace_surrogates(sample):
    return surrogate.sub("#S\g<1>", sample)

Thus it seems that the official JSON is not so good for saving meliae dumps.