Comment 20 for bug 1314129

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Several unicode related issues were revealed by actual syncs from oslo-incubator to affected projects. nova expects json.loads() to always return unicode strings. trove expects json.loads() to always return unicode strings. glance expected json.load() to always return unicode strings.

Simplejson library applies some optimisations that does not guarantee unicode strings if bytes ASCII-only input is parsed. That's why we need to pass unicode strings and file objects to underlying json implementation. For this, another oslo-incubator patch was sent to make sure json.load() and json.loads() always return unicode strings inside json dictionaries:

After discussion with Doug in IRC and Gerrit, we've come to the conclusion that we can't just apply codecs.getreader('utf-8') to fp argument of jsonutils.load() since this may fail for other ASCII-based encodings other than UTF-8. As per official json module description at

"If the contents of fp are encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified. Encodings that are not ASCII based (such as UCS-2) are not allowed, and should be wrapped with codecs.getreader(encoding)(fp), or simply decoded to a unicode object and passed to loads()."

Also as per JSONDecoder.__init__() in official json.decoder module,

"``encoding`` determines the encoding used to interpret any ``str`` objects decoded by this instance (utf-8 by default). It has no effect when decoding ``unicode`` objects."

So to make sure we're on par with stdlib json implementation, we need to support encoding argument to allow non UTF-8 encodings, and assume we're passed a 'utf-8' file object that is safe to codecs.getreader('utf-8')(fp) otherwise.