UTF-8 error blocking import of Mercurial (hg) repository

Bug #838980 reported by Jim Slattery
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Fast Import
Fix Released
Medium
Jelmer Vernooij

Bug Description

I have a Mercurial repository that exported successfully to the fast-import format. However, when I try to import, it fails with a 'utf8' error:

...
16:02:09 800/2100 commits processed at 199/minute (800)
16:03:52 900/2100 commits processed at 157/minute (900)
ABORT: exception occurred processing commit :901
bzr: ERROR: exceptions.UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 14: unexpected code byte

Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.6/bzrlib/commands.py", line 946, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/bzrlib/commands.py", line 1150, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/pymodules/python2.6/bzrlib/commands.py", line 699, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/pymodules/python2.6/bzrlib/commands.py", line 721, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/bzrlib/cleanup.py", line 135, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/bzrlib/plugins/fastimport/cmds.py", line 314, in run
    user_map=user_map)
  File "/usr/lib/pymodules/python2.6/bzrlib/plugins/fastimport/cmds.py", line 40, in _run
    return proc.process(p.iter_commands)
  File "/usr/lib/pymodules/python2.6/bzrlib/plugins/fastimport/processors/generic_processor.py", line 311, in process
    super(GenericProcessor, self)._process(command_iter)
  File "/usr/lib/pymodules/python2.6/fastimport/processor.py", line 76, in _process
    handler(self, cmd)
  File "/usr/lib/pymodules/python2.6/bzrlib/plugins/fastimport/processors/generic_processor.py", line 536, in commit_handler
    handler.process()
  File "/usr/lib/pymodules/python2.6/fastimport/processor.py", line 158, in process
    handler(self, fc)
  File "/usr/lib/pymodules/python2.6/bzrlib/plugins/fastimport/bzr_commit_handler.py", line 890, in modify_handler
    self._modify_item(filecmd.path.decode('utf8'), kind,
  File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 14: unexpected code byte

I'm running this on Ubuntu. bzr version "2.4.0-1~bazaar1~lucid1", and bzr-fastimport version "0.11.0-1~lucid1".

Maybe this problem is because older Mercurial commits didn't always contain valid UTF-8 data??

Note: I was able to work around this issue by changing
"path.decode('utf-8')" to instead say: "path.decode('utf-8', 'replace')"
on lines 890 and 895 of /usr/lib/pymodules/python2.6/bzrlib/plugins/fastimport/bzr_commit_handler.py

Related branches

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

we should print a saner error (without a traceback) and allow the user to specify some way to ignore utf8-invalid data.

Changed in bzr-fastimport:
status: New → Triaged
importance: Undecided → Medium
Jelmer Vernooij (jelmer)
Changed in bzr-fastimport:
status: Triaged → Fix Committed
assignee: nobody → Jelmer Vernooij (jelmer)
milestone: none → 0.12.0
Jelmer Vernooij (jelmer)
Changed in bzr-fastimport:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.