git: Doesn't handle non-utf8 characters

Bug #1489872 reported by Nicolas DERIVE
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Git Plugin
Confirmed
Undecided
Unassigned
Breezy
Fix Released
Wishlist
Jelmer Vernooij

Bug Description

When trying to import some git repo (Navitia) I got the following error:

2015-08-28 11:06:45 INFO Starting job.
2015-08-28 11:06:45 INFO Getting exising bzr branch from central store.
2015-08-28 11:06:45 INFO [chan bzr SocketAsChannelAdapter] Opened sftp connection (server version 3)
2015-08-28 11:06:45 INFO [chan bzr SocketAsChannelAdapter] Opened sftp connection (server version 3)
2015-08-28 11:06:45 INFO 57 bytes transferred
2015-08-28 11:06:46 INFO Importing branch.
2015-08-28 11:06:47 INFO Counting objects: 57459, done. 0
2015-08-28 11:07:22 INFO finding revisions to fetch:generating index 0/57459
2015-08-28 11:07:28 INFO finding revisions to fetch:generating index 0/57459
2015-08-28 11:07:32 INFO finding revisions to fetch 1
2015-08-28 11:07:37 INFO
Traceback (most recent call last):
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/scripts/code-import-worker.py", line 96, in <module>
    sys.exit(script.main())
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/scripts/code-import-worker.py", line 91, in main
    return import_worker.run()
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/lib/lp/codehosting/codeimport/worker.py", line 583, in run
    return self._doImport()
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/lib/lp/codehosting/codeimport/worker.py", line 737, in _doImport
    inter_branch.fetch(limit=revision_limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/branch.py", line 722, in fetch
    self.fetch_objects(stop_revision, fetch_tags=fetch_tags, limit=limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/branch.py", line 745, in fetch_objects
    determine_wants, self.source.mapping, limit=limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 718, in fetch_objects
    limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 516, in import_git_objects
    target_git_object_retriever, trees_cache)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 409, in import_git_commit
    False))
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 298, in import_git_tree
    lookup_file_id, allow_submodules=allow_submodules)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 298, in import_git_tree
    lookup_file_id, allow_submodules=allow_submodules)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 298, in import_git_tree
    lookup_file_id, allow_submodules=allow_submodules)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 312, in import_git_tree
    (child_base_mode, child_mode), store_updater, lookup_file_id)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 119, in import_git_blob
    ie = cls(file_id, name.decode("utf-8"), parent_id)
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 23: invalid continuation byte
Import failed:
Traceback (most recent call last):
Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1

Tags: git

Related branches

William Grant (wgrant)
affects: launchpad → bzr-git
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: Doesn't handle non-utf8 characters

It's unclear what the best course of action is in situations like this. Bazaar uses unicode internally, Git just uses arbitrary strings.

summary: - Impossible to import a given git repo in Launchpad
+ Doesn't handle non-utf8 characters
Changed in bzr-git:
status: New → Confirmed
Jelmer Vernooij (jelmer)
Changed in brz-git:
status: New → Triaged
importance: Undecided → Medium
Jelmer Vernooij (jelmer)
Changed in brz-git:
importance: Medium → Wishlist
Jelmer Vernooij (jelmer)
affects: brz-git → brz
summary: - Doesn't handle non-utf8 characters
+ git: Doesn't handle non-utf8 characters
tags: added: git
Jelmer Vernooij (jelmer)
Changed in brz:
status: Triaged → In Progress
assignee: nobody → Jelmer Vernooij (jelmer)
milestone: none → 3.1.1
Jelmer Vernooij (jelmer)
Changed in brz:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.