Encoding inconsistency in user props

Bug #1081149 reported by Mihnea Simian
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Products.LDAPUserFolder
New
Undecided
Unassigned

Bug Description

getUser returns user properties in latin-1, and if not sufficient, in utf-8. I think it should be consistent and only use utf-8, so that the programmer knows what to expect.

I will probably get back with a patch; it seems there is something that has to do with utils module (from_utf8, to_utf8..).

Example:

# For "Stéphane" returns latin-1 encoded string:
>>> us = app.acl_users.getUser('isoarst')
>>> us.cn
'St\xe9phane I...'
>>> us.cn.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "..../lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-4: invalid data

# For Niţu returns utf-8 encoded string:
>>> us = app.acl_users.getUser('nituacor')
>>> us.cn.decode('utf-8')
u'C... Ni\u021bu'

# While in LDAP Stéphane is clearly utf-8 encoded:
>>> x = b64decode('U3TDqXBoYW5lIElTT0FSRA==')
>>> print x.decode('utf-8')
Stéphane I...

Revision history for this message
Mihnea Simian (8mabmzqcnyc1g4i7-mcmth4f-clubl5mz6ldresgv) wrote :

I got it!
There's a default setting inside utils.py module: encoding = 'latin1'
It is used to reencode decoded string from LDAP results - if encoding fails, the LDAP value is kept.

I think it should be 'utf-8' (OpenLDAP is now utf-8 by default) and manageable from ZMI (probably in 'Configure' tab)

Revision history for this message
Jukka Ojaniemi (jukka-ojaniemi) wrote :

+1 for this. We are maintaining own fork of Products.LDAPUserFolder just to change this one line.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.