git: Doesn't handle non-utf8 characters

Bug #1489872 reported by Nicolas DERIVE on 2015-08-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Git Plugin
Undecided
Unassigned
Breezy
Wishlist
Unassigned

Bug Description

When trying to import some git repo (Navitia) I got the following error:

2015-08-28 11:06:45 INFO Starting job.
2015-08-28 11:06:45 INFO Getting exising bzr branch from central store.
2015-08-28 11:06:45 INFO [chan bzr SocketAsChannelAdapter] Opened sftp connection (server version 3)
2015-08-28 11:06:45 INFO [chan bzr SocketAsChannelAdapter] Opened sftp connection (server version 3)
2015-08-28 11:06:45 INFO 57 bytes transferred
2015-08-28 11:06:46 INFO Importing branch.
2015-08-28 11:06:47 INFO Counting objects: 57459, done. 0
2015-08-28 11:07:22 INFO finding revisions to fetch:generating index 0/57459
2015-08-28 11:07:28 INFO finding revisions to fetch:generating index 0/57459
2015-08-28 11:07:32 INFO finding revisions to fetch 1
2015-08-28 11:07:37 INFO
Traceback (most recent call last):
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/scripts/code-import-worker.py", line 96, in <module>
    sys.exit(script.main())
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/scripts/code-import-worker.py", line 91, in main
    return import_worker.run()
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/lib/lp/codehosting/codeimport/worker.py", line 583, in run
    return self._doImport()
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/lib/lp/codehosting/codeimport/worker.py", line 737, in _doImport
    inter_branch.fetch(limit=revision_limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/branch.py", line 722, in fetch
    self.fetch_objects(stop_revision, fetch_tags=fetch_tags, limit=limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/branch.py", line 745, in fetch_objects
    determine_wants, self.source.mapping, limit=limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 718, in fetch_objects
    limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 516, in import_git_objects
    target_git_object_retriever, trees_cache)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 409, in import_git_commit
    False))
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 298, in import_git_tree
    lookup_file_id, allow_submodules=allow_submodules)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 298, in import_git_tree
    lookup_file_id, allow_submodules=allow_submodules)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 298, in import_git_tree
    lookup_file_id, allow_submodules=allow_submodules)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 312, in import_git_tree
    (child_base_mode, child_mode), store_updater, lookup_file_id)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17690/bzrplugins/git/fetch.py", line 119, in import_git_blob
    ie = cls(file_id, name.decode("utf-8"), parent_id)
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 23: invalid continuation byte
Import failed:
Traceback (most recent call last):
Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1

Tags: git Edit Tag help
William Grant (wgrant) on 2015-09-01
affects: launchpad → bzr-git

It's unclear what the best course of action is in situations like this. Bazaar uses unicode internally, Git just uses arbitrary strings.

summary: - Impossible to import a given git repo in Launchpad
+ Doesn't handle non-utf8 characters
Changed in bzr-git:
status: New → Confirmed
Jelmer Vernooij (jelmer) on 2018-03-08
Changed in brz-git:
status: New → Triaged
importance: Undecided → Medium
Jelmer Vernooij (jelmer) on 2018-04-03
Changed in brz-git:
importance: Medium → Wishlist
Jelmer Vernooij (jelmer) on 2018-05-10
affects: brz-git → brz
summary: - Doesn't handle non-utf8 characters
+ git: Doesn't handle non-utf8 characters
tags: added: git
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers