unicode crash on openBSD for valid file names
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
On openBSD 4.4 and 4.5 at least (confirmed on these)
On a bzr checkout of a centralized repository OR a bzr init of the working tree transferred locally through tar/untar(i.e no ADD as been formally done yet), Bazarr crahes in a hard way here is the traceback :
bzr: ERROR: exceptions.
Traceback (most recent call last):
File "/usr/local/
return the_callable(*args, **kwargs)
File "/usr/local/
ret = run(*run_argv)
File "/usr/local/
return self.run(
File "/usr/local/
possible_
File "/usr/local/
bzrdir.
File "/usr/local/
accelerator
File "/usr/local/
hardlink=
File "/usr/local/
delta_
File "/usr/local/
for dir, files in wt.walkdirs():
File "/usr/local/
current_disk = disk_iterator.
File "/usr/local/
names = sorted(
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 2: ordinal not in range(128)
Bazaar gives NO CLUES about what files triggered the error and that is soooo bad.
I had to add a line to osutils.py like this
print "============== _listdir(
to get this at last :
============== _listdir(
[u'$node_
[... traceback here ...]
Ok how did I get into that situation :
I work on a windows machine for the dev of some website.
I have a centralized checkout on an openbsd machine.
I had no trouble checking in and out and syncing thought the bzr-svr on that machine.
Now when I got into production on a third box, I tried to simply retrieve the branch from the checkout and crashed.
To meet the deadline I tared un-tared the working tree under the production box and all files came accross perfectly.
Investigating the subject a few days later I came into the understanding that the files names where making trouble in the two situation I mentioned :
* when trying to recreate a new branch (bzr init)
* when trying to checkout the foreign branch
I understand that unicode is complex and poorly supported on OpenBSD at the os level but since I can move the files from the windows box to the BSD box and see their name properly, and that I can list them, move them, delete them and such I really don't expect bzr to crash on them and be unable to handle them.
I don't know for sure where in the process BZR ends up reading the file name "Présentation DSCV.doc" as "Pr\xe9sentation DSCV.doc" and then is unable to handle it but it is really a show stopper.
\xe9 is AFAIK the proper short unicode code of "é" (e-accute).
And a try-catch that would give the path and names that got into trouble would be a boon to help deal with it in the very short term.
Many times, files that one can add to a repository are named outside of the committer control and this can be in some cases like full stop to the use of BZR.
Especially if it bites you, like it did me, at a critical project moment.
The project I talk about up there , for instance, is personal, but I try to push bzr at work for a more decentralized workflow and it will be hard knowing this.
Thanks a lot for your time
What are your locale settings?