Encoding inconsistency in user props
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Products.LDAPUserFolder |
New
|
Undecided
|
Unassigned |
Bug Description
getUser returns user properties in latin-1, and if not sufficient, in utf-8. I think it should be consistent and only use utf-8, so that the programmer knows what to expect.
I will probably get back with a patch; it seems there is something that has to do with utils module (from_utf8, to_utf8..).
Example:
# For "Stéphane" returns latin-1 encoded string:
>>> us = app.acl_
>>> us.cn
'St\xe9phane I...'
>>> us.cn.decode(
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "..../lib/
return codecs.
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-4: invalid data
# For Niţu returns utf-8 encoded string:
>>> us = app.acl_
>>> us.cn.decode(
u'C... Ni\u021bu'
# While in LDAP Stéphane is clearly utf-8 encoded:
>>> x = b64decode(
>>> print x.decode('utf-8')
Stéphane I...
I got it!
There's a default setting inside utils.py module: encoding = 'latin1'
It is used to reencode decoded string from LDAP results - if encoding fails, the LDAP value is kept.
I think it should be 'utf-8' (OpenLDAP is now utf-8 by default) and manageable from ZMI (probably in 'Configure' tab)