tuskar

Bug #1314129
Comment #20

Comment 20 for bug 1314129

Revision history for this message

Ihar Hrachyshka (ihar-hrachyshka) wrote on 2014-05-06:

#20

Several unicode related issues were revealed by actual syncs from oslo-incubator to affected projects.

https://review.openstack.org/91044: nova expects json.loads() to always return unicode strings.
https://review.openstack.org/91068: trove expects json.loads() to always return unicode strings.
https://review.openstack.org/91047: glance expected json.load() to always return unicode strings.

Simplejson library applies some optimisations that does not guarantee unicode strings if bytes ASCII-only input is parsed. That's why we need to pass unicode strings and file objects to underlying json implementation. For this, another oslo-incubator patch was sent to make sure json.load() and json.loads() always return unicode strings inside json dictionaries: https://review.openstack.org/91344

After discussion with Doug in IRC and Gerrit, we've come to the conclusion that we can't just apply codecs.getreader('utf-8') to fp argument of jsonutils.load() since this may fail for other ASCII-based encodings other than UTF-8. As per official json module description at https://docs.python.org/2/library/json.html#json.load

"If the contents of fp are encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified. Encodings that are not ASCII based (such as UCS-2) are not allowed, and should be wrapped with codecs.getreader(encoding)(fp), or simply decoded to a unicode object and passed to loads()."

Also as per JSONDecoder.__init__() in official json.decoder module,

"``encoding`` determines the encoding used to interpret any ``str`` objects decoded by this instance (utf-8 by default). It has no effect when decoding ``unicode`` objects."

So to make sure we're on par with stdlib json implementation, we need to support encoding argument to allow non UTF-8 encodings, and assume we're passed a 'utf-8' file object that is safe to codecs.getreader('utf-8')(fp) otherwise.