unicode unsupported

Bug #91638 reported by Tollef Fog Heen
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

If you create a SVN repository with files whose names are non-ASCII, svn2bzr fails to convert the repositories:

: tfheen@golem /tmp/trunk > ./svn2bzr.py ../d/dump blah
Revision 0 read
/usr/lib/python2.5/site-packages/bzrlib/osutils.py:859: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if head == base:
Traceback (most recent call last):
  File "./svn2bzr.py", line 1073, in <module>
  File "./svn2bzr.py", line 1066, in main
    opts.prefix, opts.filter)
  File "./svn2bzr.py", line 999, in svn2bzr
  File "./svn2bzr.py", line 471, in run
    self.add_file(node_path, content)
  File "./svn2bzr.py", line 214, in add_file
    abspath = brt.tree.abspath(path_brt)
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree.py", line 384, in abspath
    return pathjoin(self.basedir, filename)
  File "/usr/lib/python2.5/posixpath.py", line 65, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
: tfheen@golem /tmp/trunk >

I'll attach the repository dump, it was generated by the following sequence:

: tfheen@golem /tmp/d > svnadmin create dir
: tfheen@golem /tmp/d > mkdir tmp
: tfheen@golem /tmp/d > cd tmp
: tfheen@golem /tmp/d/tmp > svn co file:///tmp/d/dir
Sjekket ut revisjon 0.
: tfheen@golem /tmp/d/tmp > cd dir
: tfheen@golem /tmp/d/tmp/dir > touch æøå
: tfheen@golem /tmp/d/tmp/dir > svn add æøå
A æøå
: tfheen@golem /tmp/d/tmp/dir > svn ci æøå -m'Blah'

Revision history for this message
Tollef Fog Heen (tfheen) wrote :
Revision history for this message
Pascal Bach (pascal-bach) wrote : Possible Solution

I wrote a small patch. It worked with your dump file and my own branches which have French characters in the filenames.
But I currently don't know enough about the internals of bzrlib to say if it has some side effects.

Revision history for this message
Christian Tschabuschnig (tschaboo) wrote : Re: does not handle non-ascii file names

Your patched worked for my import, but when I rename the problematic directory (it includes the letter 'Ä') later, i get an error.

Revision history for this message
Alexandre (lexrupy) wrote :

the patch worked for me. as I don't know where is the problematic file (since the traceback does't tell me the file, just the invalid character). but
checking out directly the last version from svn, then create a new bzr repo and commit files there I got no errors.
maybe the problematic file was just in a specific location on timeline, not on the head.
in any case, patch seems to be working

Revision history for this message
Markus Birth (mbirth) wrote :

I had to also change line #324 to:

               copy_dest_path = os.path.join(dest_path, tail.decode("utf8"))

Because it choked on branches/tags/trunk-structures with that nice UnicodeDecodeError. After this change and together with the path above, it worked flawlessly.

Revision history for this message
Martin Spacek (mspacek) wrote :

This has been added to the TODO. I don't know enough (or anything!) about unicode at this point to feel confident about any fixes.

Changed in svn2bzr:
importance: Undecided → Medium
status: New → Confirmed
Martin Spacek (mspacek)
summary: - does not handle non-ascii file names
+ unicode unsupported
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.