Comment 1 for bug 585245

Revision history for this message
Matt Giuca (mgiuca) wrote :

I have had a very close look at the cjson source. It is actually based on the C code behind Python's 'repr' function, so it quite deliberately encodes using the '\U00xxxxxx' notation even though this notation is not anywhere in the JSON spec. It produces this output on both the UCS2 and UCS4 builds of Python, because on the UCS2 builds, it even goes so far as to detect UTF-16 surrogate pairs and transform them to '\U00xxxxxx' notation.

I have written a patch against cjson which I will send to the author (there seems to be no public repository or bug tracker anywhere that I can find ... can you?) But this won't be easy to get into IVLE in the near future, so I would recommend using another library to tide us over until we can support only Python 2.6 (which has a built-in JSON library).

Writing a work-around this would be quite hard, as it wouldn't even be enough to pre-process the strings to contain UTF-16 surrogate pairs (as cjson will work around that). You would have to post-process the output string. Not worth it. Change libraries.