Comment 2 for bug 273978

Revision history for this message
mijutu (mijutu) wrote : Re: [Bug 273978] Re: bzr wrongly assumes error messages written in utf8 to be ascii

to, 2009-11-12 kello 12:59 +0000, John A Meinel kirjoitti:
> I'm pretty sure we need more context to be able to determine what is
> going on here.

mijutu@crc11-ett:~$ cd /tmp/
mijutu@crc11-ett:/tmp$ rm -rf test
mijutu@crc11-ett:/tmp$ mkdir test
mijutu@crc11-ett:/tmp$ chmod a-rx test
mijutu@crc11-ett:/tmp$ echo $LANG
fi_FI.UTF-8
mijutu@crc11-ett:/tmp$ bzr branch test/someproject newbranch
bzr: ERROR: Unprintable exception PermissionDenied: dict={'path':
u'/tmp/test/someproject/.bzr/branch-format', '_preformatted_string':
None, 'extra': ": [Errno 13] Lupa ev\xc3\xa4tty:
u'/tmp/test/someproject/.bzr/branch-format'"}, fmt='Permission denied:
"%(path)s"%(extra)s', error=UnicodeDecodeError('ascii', ": [Errno 13]
Lupa ev\xc3\xa4tty: u'/tmp/test/someproject/.bzr/branch-format'", 20,
21, 'ordinal not in range(128)')
mijutu@crc11-ett:/tmp$ LANG=C bzr branch test/someproject newbranch
bzr: ERROR: Permission denied:
"/tmp/test/someproject/.bzr/branch-format": [Errno 13] Permission
denied: u'/tmp/test/someproject/.bzr/branch-format'
mijutu@crc11-ett:/tmp$ python --version
Python 2.5.4
mijutu@crc11-ett:/tmp$ bzr --version
Bazaar (bzr) 1.16.1

Whoops, old bzr. Let's try again.

mijutu@crc11-ett:/tmp$ bzr branch test/someproject newbranch
bzr: ERROR: Unprintable exception PermissionDenied: dict={'path':
u'/tmp/test/someproject/.bzr/branch-format', '_preformatted_string':
None, 'extra': ": [Errno 13] Lupa ev\xc3\xa4tty:
u'/tmp/test/someproject/.bzr/branch-format'"}, fmt='Permission denied:
"%(path)s"%(extra)s', error=UnicodeDecodeError('ascii', ": [Errno 13]
Lupa ev\xc3\xa4tty: u'/tmp/test/someproject/.bzr/branch-format'", 20,
21, 'ordinal not in range(128)')
mijutu@crc11-ett:/tmp$ bzr --version
Bazaar (bzr) 2.0.2

http://wiki.python.org/moin/UnicodeDecodeError

Python seems to default to latin1 somewehere:

$ python
Python 2.5.4 (r254:67916, Sep 26 2009, 10:32:22)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "aäa".decode("utf-8")
u'a\xe4a'

I have not mentioned latin1 anywhere (locale is utf-8), but still I see
"a\xe4a".
0xE4 is ä in latin1. Where does that come from?

"Lupa evätty" is latin1-compatible, so I guess this latin1 assumption is
not be responsible for the error.

>>> "ä".decode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)

Even the error messages are not exactly the same, it seems that
somewhere in bzr the permission denied error message from libc is
converted to python utf8 string assuming that it is ascii.
Instead of assuming ascii, bzr should check LC_MESSAGES and choose
encoding accordingly.