Imports non-UTF-8 characters

Bug #54736 reported by Jelmer Vernooij
4
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
Fix Released
Low
Jelmer Vernooij

Bug Description

Subversion requires its metadata to be in UTF-8, but this isn't enforced. Bazaar will complain, so bzr-svn needs to filter out non-UTF8 characters from Subversion metadata (log message, author name, filenames).

Related branches

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
assignee: nobody → jelmer
status: Unconfirmed → Confirmed
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
importance: Undecided → High
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Confirmed → Fix Committed
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Fix Committed → Fix Released
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

I think this is not completely fixed.

I'm trying to convert an old SVN tree I have around and I get an error while excuting:

bzr svn-import --shared --all --scheme=none http://svn-repository

...

svn update -r 42 ''
added revision_id {svn-v2:42@87d805a7-88b8-0310-b97b-b17772758398-}
svn update -r 43 ''
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 650, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 612, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 304, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 622, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/__init__.py", line 153, in run
    all)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/convert.py", line 113, in convert_repository
    source_repos.copy_content_into(target_repos)
  File "/usr/lib/python2.5/site-packages/bzrlib/repository.py", line 268, in copy_content_into
    return InterRepository.get(self, destination).copy_content(revision_id, basis)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/fetch.py", line 402, in copy_content
    svn.ra.reporter2_invoke_finish_report(reporter, reporter_baton, pool)
  File "/var/lib/python-support/python2.5/libsvn/ra.py", line 745, in svn_ra_reporter2_invoke_finish_report
    return apply(_ra.svn_ra_reporter2_invoke_finish_report, args)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/fetch.py", line 117, in add_directory
    file_id = self._get_new_id(parent_id, path)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/fetch.py", line 103, in _get_new_id
    return generate_file_id(self.revid, new_path)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/fileids.py", line 56, in generate_file_id
    return generate_svn_file_id(uuid, revnum, branch, path)
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/fileids.py", line 44, in generate_svn_file_id
    ret = "%s-%s" % (introduced_revision_id, escape_svn_path(path))
  File "/usr/lib/python2.5/site-packages/bzrlib/plugins/svn/repository.py", line 63, in escape_svn_path
    return unicode(''.join(r))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

return code 3

The problem with that revision is that I started using non ASCII chars for some files like:

Exámenes

I'm using Ubuntu Feisty package of bzr-svn version 0.3-0ubuntu2

Changed in bzr-svn:
status: Fix Released → Unconfirmed
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 54736] Re: Imports non-UTF-8 characters

  importance low

Lowering priority as this doesn't cause invalid characters to be
imported into Bazaar, corrupting repositories.

--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
importance: High → Low
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

  status fixcommitted

Fixed in the 0.3 branch, should be part of 0.3.2, which will be released
in a few days.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBRfXgaQy0JeEGD2blAQLpgQQAgV5fWRYD7Zl9E/2ybqW2pgqf/qKvkwMg
P1E7Yi3TxoT0odUljOTiBXQaQFjKZbabRWpvmsOyvOoIhxbZMhNXFyLNqtq5LTX5
NZMLtlARTC6Y4SCAKaftFy019gQCow3jwX+miSiOySbfoaUBZBZEkV6e4xVjXm2N
v8PKKRqWlPw=
=0onu
-----END PGP SIGNATURE-----

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Unconfirmed → Fix Committed
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.