unicode crash on openBSD for valid file names

Bug #380007 reported by xVaultX
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Incomplete
Undecided
Unassigned

Bug Description

On openBSD 4.4 and 4.5 at least (confirmed on these)
On a bzr checkout of a centralized repository OR a bzr init of the working tree transferred locally through tar/untar(i.e no ADD as been formally done yet), Bazarr crahes in a hard way here is the traceback :

bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 2: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/commands.py", line 729, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/commands.py", line 924, in run_bzr
    ret = run(*run_argv)
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/commands.py", line 560, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/builtins.py", line 1607, in run
    possible_transports=[to_transport])
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/bzrdir.py", line 525, in create_branch_convenience
    bzrdir.create_workingtree()
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/bzrdir.py", line 1599, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/workingtree_4.py", line 1441, in initialize
    hardlink=hardlink, delta_from_tree=True)
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/transform.py", line 2035, in build_tree
    delta_from_tree)
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/transform.py", line 2051, in _build_tree
    for dir, files in wt.walkdirs():
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/workingtree.py", line 2396, in walkdirs
    current_disk = disk_iterator.next()
  File "/usr/local/lib/python2.5/site-packages/bzr-1.15rc1-py2.5-openbsd-4.4-i386.egg/bzrlib/osutils.py", line 1347, in walkdirs
    names = sorted(_listdir(top))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 2: ordinal not in range(128)

Bazaar gives NO CLUES about what files triggered the error and that is soooo bad.
I had to add a line to osutils.py like this
print "============== _listdir(top)=============== :", top, ' ===\n', _listdir(top)
to get this at last :

============== _listdir(top)=============== : /var/www/htdocs/drupal/sites/xxxxxxxx/DesignDocQeFDrupal ===
[u'$node_stdClass_.txt', u'GeneralDesign.txt', u'mail.htm', u'AXIOMES ET CONCEPTS DEFINITIF XOU 2.doc', u'AXIOMES ET CONCEPTS DEFINITIF.doc', u'mail_fichiers', u'RoadToCompletion.txt', u'PRESENTATION LQDF.doc', 'Pr\xe9sentation DSCV.doc', u'Test_css']
[... traceback here ...]

Ok how did I get into that situation :

I work on a windows machine for the dev of some website.
I have a centralized checkout on an openbsd machine.
I had no trouble checking in and out and syncing thought the bzr-svr on that machine.

Now when I got into production on a third box, I tried to simply retrieve the branch from the checkout and crashed.
To meet the deadline I tared un-tared the working tree under the production box and all files came accross perfectly.
Investigating the subject a few days later I came into the understanding that the files names where making trouble in the two situation I mentioned :
* when trying to recreate a new branch (bzr init)
* when trying to checkout the foreign branch

I understand that unicode is complex and poorly supported on OpenBSD at the os level but since I can move the files from the windows box to the BSD box and see their name properly, and that I can list them, move them, delete them and such I really don't expect bzr to crash on them and be unable to handle them.

I don't know for sure where in the process BZR ends up reading the file name "Présentation DSCV.doc" as "Pr\xe9sentation DSCV.doc" and then is unable to handle it but it is really a show stopper.
\xe9 is AFAIK the proper short unicode code of "é" (e-accute).

And a try-catch that would give the path and names that got into trouble would be a boon to help deal with it in the very short term.
Many times, files that one can add to a repository are named outside of the committer control and this can be in some cases like full stop to the use of BZR.

Especially if it bites you, like it did me, at a critical project moment.

The project I talk about up there , for instance, is personal, but I try to push bzr at work for a more decentralized workflow and it will be hard knowing this.

Thanks a lot for your time

Revision history for this message
Robert Collins (lifeless) wrote :

What are your locale settings?

Changed in bzr:
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.