non-UTF8 paths in mercurial repository

Bug #486541 reported by Andy R Terrel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Hg Plugin
Won't Fix
Undecided
Unassigned

Bug Description

I am receiving the following traceback when I try to convert mercurial repository at http://www.csc.kth.se/~jjan/transfer/unicorn-2009-11-02.zip

I am using Mercurial 1.3.1 and Bazaar 2.0.2. Any help is appreciated.

-- Andy

aterrel@aterrel:~/scratch/bzr_fenics_apps$ bzr branch ../unicorn_zip/unicorn-stable
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/local/lib/python2.6/site-packages/bzrlib/commands.py", line 842, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/commands.py", line 1037, in run_bzr
    ret = run(*run_argv)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/commands.py", line 654, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/builtins.py", line 1243, in run
    source_branch=br_from)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/bzrdir.py", line 1185, in sprout
    result_repo.fetch(source_repository, fetch_spec=fetch_spec)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/repository.py", line 1695, in fetch
    find_ghosts=find_ghosts, fetch_spec=fetch_spec)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/decorators.py", line 192, in write_locked
    result = unbound(self, *args, **kwargs)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/plugins/hg/fetch.py", line 665, in fetch
    self.addchangegroup(cg, mapping)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/plugins/hg/fetch.py", line 508, in addchangegroup
    self._add_inventories(manifestchunks2, mapping, pb)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/plugins/hg/fetch.py", line 430, in _add_inventories
    basis_inv)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/repofmt/groupcompress_repo.py", line 872, in add_inventory_by_delta
    propagate_caches=propagate_caches)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/inventory.py", line 1814, in create_by_apply_delta
    new_value = result._entry_to_bytes(entry)
  File "/usr/local/lib/python2.6/site-packages/bzrlib/inventory.py", line 1534, in _entry_to_bytes
    name_str = entry.name.encode("utf8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

bzr 2.0.2 on python 2.6.1 (Darwin-10.2.0-i386-64bit)
arguments: ['/usr/local/bin/bzr', 'branch', '../unicorn_zip/unicorn-stable']
encoding: 'UTF-8', fsenc: 'utf-8', lang: 'en_US.UTF-8'
plugins:
  hg /usr/local/lib/python2.6/site-packages/bzrlib/plugins/hg [0.1.0]
  launchpad /usr/local/lib/python2.6/site-packages/bzrlib/plugins/launchpad [2.0.2]
  netrc_credential_store /usr/local/lib/python2.6/site-packages/bzrlib/plugins/netrc_credential_store [2.0.2]

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    including this traceback and a description of the problem.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

We're currently assuming that all paths inside of a Mercurial repository are encoded in utf8. Your repository appears to contain utf8-invalid data.

Is there some way we can figure out the proper path encoding?

Changed in bzr-hg:
status: New → Incomplete
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

It looks like this might be a won'tfix :-( There is no definitive way we can determine the encoding. I guess we could fall back to iso8559-1 and just *hope* that is correct..

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 486541] Re: UnicodeDecodeError

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jelmer Vernooij wrote:
> It looks like this might be a won'tfix :-( There is no definitive way we
> can determine the encoding. I guess we could fall back to iso8559-1 and
> just *hope* that is correct..
>

As I understand it, Mercurial just takes whatever the 8-bit filesystem
tells it. And has no defined encoding.

Which is also why committing in a UTF-8 filesystem on Linux and checking
out on Windows "breaks" because it uses the 8-bit (often latin-1)
encoding on windows.

Anyway, I'm going to probably agree with Won't Fix, at least from "If we
want to assume unicode in bzr, we can't really support arbitrary 8-bit
strings."

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksTMp0ACgkQJdeBCYSNAAN6/ACgud+XHt5sfA8zTcZds6H9nNkG
pDsAoKYFk3YGFZ2iEUsBW9fCJyFh8Rzs
=IKIg
-----END PGP SIGNATURE-----

Revision history for this message
Andy R Terrel (andy-terrel) wrote : Re: UnicodeDecodeError

Jelmer and John, thanks for your comments! I didn' t know bzr only used utf-8. Perhaps this needs to be moved somewhere else, but I still would like to get the mercurial repository converted to bzr. Is there a way to do this? I'm asking the group about their encoding but I wouldn't be surprised if they didn't know.

Thanks for your help.

Revision history for this message
Andy R Terrel (andy-terrel) wrote :

We figured out which files were causing the problem and were able to get around it by excluding them with "hg convert --file-map". Thanks for taking the time to look at this.

Jelmer Vernooij (jelmer)
summary: - UnicodeDecodeError
+ non-UTF8 paths in mercurial repository
Changed in bzr-hg:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.