crash on checkout from SVN WC w/ unicode name

Bug #185401 reported by Wesley J. Landaker
2
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
New
Undecided
Unassigned

Bug Description

Using bzr-svn 0.4 branch rev 853, I tried the following:

1. svn co file:///path/to/repos trunk
2. bzr branch trunk trunk-bzr

This usually works flawlessly. However, this repository has one directory named "I²C" -- I've verified it's correct UTF-8 in SVN, and my locale is (and always has been) UTF-8. The directory name, when checked out with SVN, is definitely UTF-8 as well. See below for an example:

$ ls -d I²C/ | xxd
0000000: 49c2 b243 2f0a I..C/.

Note that 0xc2 0xb2 is U+00B2 ("²") in UTF-8.

Anyway, the error I get is the following:

$ bzr branch trunk trunk-bzr
bzr: ERROR: libsvn._core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 806, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 762, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.4/site-packages/bzrlib/builtins.py", line 877, in run
    accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
  File "/usr/lib/python2.4/site-packages/bzrlib/bzrdir.py", line 774, in open_tree_or_branch
    return bzrdir._get_tree_branch()
  File "/usr/lib/python2.4/site-packages/bzrlib/bzrdir.py", line 756, in _get_tree_branch
    tree = self.open_workingtree()
  File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 741, in open_workingtree
    return SvnWorkingTree(self, self.local_path, self.open_branch())
  File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 80, in __init__
    status = svn.wc.revision_status(self.basedir, None, True, None, None)
  File "/var/lib/python-support/python2.4/libsvn/wc.py", line 1577, in svn_wc_revision_status
    return apply(_wc.svn_wc_revision_status, args)
SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

bzr 1.1.0.candidate.1 on python 2.4.4.final.0 (linux2)
arguments: ['/usr/bin/bzr', 'branch', 'trunk', 'trunk-bzr']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
plugins:
  bzrtools /usr/lib/python2.4/site-packages/bzrlib/plugins/bzrtools [1.1.0]
  email /usr/lib/python2.4/site-packages/bzrlib/plugins/email [unknown]
  gtk /usr/lib/python2.4/site-packages/bzrlib/plugins/gtk [0.93.0]
  launchpad /usr/lib/python2.4/site-packages/bzrlib/plugins/launchpad [unknown]
  multiparent /usr/lib/python2.4/site-packages/bzrlib/plugins/multiparent.pyc [unknown]
  rebase /usr/lib/python2.4/site-packages/bzrlib/plugins/rebase [0.3.0]
  svn /home/wjlanda/.bazaar/plugins/svn [0.4.7dev0]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

Please let me know if there is other information I can provide to help fix this.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

This may or may not be related, but I'll mention it. If I try to do a branch directly from the svn repo (which also normally works great), I get the following error, after about 5 minutes:

$ bzr branch svn+file:///path/to/repo trunk-bzr
bzr: ERROR: Path "I²C" is not unicode normalized

I tried running with -vv but didn't get any extra output. The path seems encoded just fine, looking at it this way:

$ bzr branch svn+file:///path/to/repo trunk-bzr 2<&1 | xxd
0000000: 627a 723a 2045 5252 4f52 3a20 5061 7468 bzr: ERROR: Path
0000010: 2022 49c2 b243 2220 6973 206e 6f74 2075 "I..C" is not u
0000020: 6e69 636f 6465 206e 6f72 6d61 6c69 7a65 nicode normalize
0000030: 640a

Again note that the 0xc2 0xb2, which is correct UTF-8 for U+00B2 ("²").

(Using svn+https vs. svn+file doesn't make any difference here, BTW.)

Anyway, if this ends up being unrelated and the previous error gets fixed, I'll go ahead and file this part as a separate bug. I think these might be related because the "I²C" name is the only non-ASCII name in the whole repository.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

To help reproduce, I've attached a VERY simple repository and working copy that you should be able to use to debug the problem.

Basically, just:

$ tar -xzvf bzr-svn-185401.tar.gz
$ cd bzr-svn-185401
$ bzr branch svn-wc bzr-branch
[first error message with traceback]

You can see the other problem with:

$ bzr co svn-repo bzr-branch
bzr: ERROR: Path "I²C" is not unicode normalized

Hope this helps. Please let me know if there is anything else I can do!

Revision history for this message
Wesley J. Landaker (wjl) wrote :

Here is something interesting. I took a crack at debugging this more myself: I just called the same function that bzr-svn calls in the traceback directly from interactive python. This SEEMS to indicate this is a bug in python-svn, but I suppose it still could be a bug in how bzr-svn is using it. I don't know the python-svn API well enough to tell at a glance.

$ python
Python 2.4.4 (#2, Jan 3 2008, 13:36:28)
[GCC 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import svn.wc
>>> svn.wc.revision_status("/tmp/bzr-svn-185401/svn-wc", None, True, None, None)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/var/lib/python-support/python2.4/libsvn/wc.py", line 1577, in svn_wc_revision_status
    return apply(_wc.svn_wc_revision_status, args)
libsvn._core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)
>>> svn.wc.revision_status("/tmp/bzr-svn-185401/svn-wc", None, True, None, None)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/var/lib/python-support/python2.4/libsvn/wc.py", line 1577, in svn_wc_revision_status
    return apply(_wc.svn_wc_revision_status, args)
libsvn._core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)
>>>

Revision history for this message
Wesley J. Landaker (wjl) wrote :

I forwarded the last part on to the <email address hidden> list, since you have to post there before they'll let you file a bug in their tracker.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.