bzr-git does not preserve or ignore unknown extra fields

Bug #1372149 reported by Roman
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Bazaar Git Plugin
Invalid
Undecided
Unassigned
Breezy
Invalid
Medium
Unassigned
Valentina
Fix Released
Critical
auto-dismine-1

Bug Description

I can't import source code from git. Here log:
2014-09-21 12:57:19 INFO Starting job.
2014-09-21 12:57:19 INFO Getting exising bzr branch from central store.
2014-09-21 12:57:20 INFO [chan bzr SocketAsChannelAdapter] Opened sftp connection (server version 3)
2014-09-21 12:57:20 INFO 35 bytes transferred
2014-09-21 12:57:29 INFO [chan bzr SocketAsChannelAdapter] Opened sftp connection (server version 3)
2014-09-21 12:57:30 INFO Importing branch.
2014-09-21 12:57:32 INFO Counting objects: 2724, done. 0
2014-09-21 12:57:34 INFO finding revisions to fetch:generating index 0/2724
2014-09-21 12:57:36 INFO finding revisions to fetch:generating index 0/2724
2014-09-21 12:57:36 INFO finding revisions to fetch 1/608
2014-09-21 12:57:36 INFO
Traceback (most recent call last):
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/scripts/code-import-worker.py", line 95, in <module>
    sys.exit(script.main())
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/scripts/code-import-worker.py", line 90, in main
    return import_worker.run()
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/lib/lp/codehosting/codeimport/worker.py", line 576, in run
    return self._doImport()
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/lib/lp/codehosting/codeimport/worker.py", line 730, in _doImport
    inter_branch.fetch(limit=revision_limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/branch.py", line 722, in fetch
    self.fetch_objects(stop_revision, fetch_tags=fetch_tags, limit=limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/branch.py", line 745, in fetch_objects
    determine_wants, self.source.mapping, limit=limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/fetch.py", line 718, in fetch_objects
    limit)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/fetch.py", line 484, in import_git_objects
    mapping.revision_id_foreign_to_bzr)
  File "/srv/importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/mapping.py", line 334, in import_commit
    raise UnknownCommitExtra(commit, [item[0] for item in commit.extra])
bzrlib.plugins.git.errors.UnknownCommitExtra: Unknown extra fields in <Commit 56ead206c93f1ed9d6f1d46cbfa2a8e79cdad63c>: ['HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename'].
Import failed:
Traceback (most recent call last):
Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.

Can anyone help me? Can't understand what's wrong.
This git repository (https://github.com/dismine/Valentina.git) only a mirror and i used hg-git for pushing changes.

William Grant (wgrant)
affects: launchpad → bzr-git
Revision history for this message
Susan Spencer (susan-spencer) wrote :

bzr-git is not ignoring metadata. fields that it doesn't understand.

Jelmer Vernooij (jelmer)
summary: - import failure with bzrlib.plugins.git.errors.UnknownCommitExtra
+ bzr-git does not support HG:rename commit header fields
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: bzr-git does not support HG:rename commit header fields

The C git developers discourage applications like hg-git from extending the git headers.

Throwing an error when unknown fields are encountered is intended behaviour. bzr-git can't just drop unknown metadata since that makes it impossible to reserialize the original Git objects later. Reserializing is necessary since the original git objects are used as delta bases.

bzr-git could store all unknown headers in an extra metadata field. We don't do this since it means we can't change the mapping of any unknown headers later without breaking all existing bzr-git users. Instead, since git commit header extensions are so rare, we require explicit support for them in bzr-git.

Changed in bzr-git:
status: New → Confirmed
Revision history for this message
Susan Spencer (susan-spencer) wrote : Fwd: HG:rename tags not removed by hg-git

Hi Jelmer,

Hi Jelmer,

The HG guys are interested in this bug but they think its a bzr-git issue.
Does Dulwich preserve or ignore extra fields?

---------- Forwarded message ----------
From: Siddharth Agarwal <email address hidden>
Date: Tue, Sep 30, 2014 at 12:15 PM
Subject: Re: HG:rename tags not removed by hg-git
To: <email address hidden>, Susan Spencer <email address hidden>
Cc: Siddharth Agarwal <email address hidden>

On 09/30/2014 10:13 AM, Augie Fackler wrote:

> On Tue, Sep 30, 2014 at 10:32 AM, Susan Spencer <email address hidden>
> wrote:
>
>> Should the 'HG:rename' tags be transferred from hg to git?
>> Git doesn't generate an error message from these tags.
>> However, here is my problem with these tags...
>>
>> My project is converted from hg/bitbucket to git/github to bzr/launchpad.
>>
>> bzr returns an error message on the HG:rename tags as:
>>
>> raise UnknownCommitExtra(commit, [item[0] for item in commit.extra])
>> bzrlib.plugins.git.errors.UnknownCommitExtra: Unknown extra fields in
>> <Commit 56ead206c93f1ed9d6f1d46cbfa2a8e79cdad63c>: ['HG:rename',
>> 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename',
>> 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename',
>> 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename', 'HG:rename'].
>> Import failed
>>
> You're probably doing the conversion using a relatively recent hg-git,
> which just started storing this data in the hidden metadata section of
> git commits. This is (probably) a bzr-git bug wherein they're choking
> on metadata fields they don't understand, rather than just preserving
> or dropping them as appropriate.
>

Yes, please report this to bzr-git. bzr-git isn't matching git's or
Dulwich's behavior of preserving but otherwise ignoring unknown extra
fields.

- Siddharth

Revision history for this message
Susan Spencer (susan-spencer) wrote : Re: bzr-git does not support HG:rename commit header fields

Could not import probably because of bzr-git bug #963525 or bug #1084403
https://launchpad.net/launchpad/+bug/963525
https://launchpad.net/launchpad/+bug/1084403

Another project is also failing to import code from a previously working git import, returning the same error code when importing source code from a git repo into bzr:
https://code.launchpad.net/~gnome-shell-extensions/gnome-shell-extensions/appindicator-support-head

summary: - bzr-git does not support HG:rename commit header fields
+ bzr-git does not preserve or ignore unknown extra fields
Revision history for this message
Susan Spencer (susan-spencer) wrote : Fwd: HG:rename tags not removed by hg-git

Here is background info from an hg-git developer about the
change made to hg-git which may require a change to bzr-git:

---------- Forwarded message ----------
From: Augie Fackler

<snip>
You're probably doing the conversion using a relatively recent hg-git,
which just started storing this data in the hidden metadata section of
git commits. This is (probably) a bzr-git bug wherein they're choking
on metadata fields they don't understand, rather than just preserving
or dropping them as appropriate.

Revision history for this message
Siddharth Agarwal (sid-w) wrote :

Note that the HG:rename fields aren't the only ones hg-git stores. All fields that hg-git stores will have an 'HG:' prefix, though.

Changed in bzr-git:
assignee: nobody → Jelmer Vernooij (jelmer)
Revision history for this message
Roman (dismine) wrote : Re: [Bug 1372149] Re: bzr-git does not preserve or ignore unknown extra fields
Download full text (3.8 KiB)

Great news.

On Tue, Oct 21, 2014 at 7:04 AM, Susan Spencer <email address hidden>
wrote:

> ** Changed in: bzr-git
> Assignee: (unassigned) => Jelmer Vernooij (jelmer)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1372149
>
> Title:
> bzr-git does not preserve or ignore unknown extra fields
>
> Status in bzr git support plugin:
> Confirmed
> Status in Valentina. Pattern making program.:
> New
> Status in “valentina” package in Ubuntu:
> New
>
> Bug description:
> I can't import source code from git. Here log:
> 2014-09-21 12:57:19 INFO Starting job.
> 2014-09-21 12:57:19 INFO Getting exising bzr branch from central
> store.
> 2014-09-21 12:57:20 INFO [chan bzr SocketAsChannelAdapter] Opened
> sftp connection (server version 3)
> 2014-09-21 12:57:20 INFO 35 bytes transferred
> 2014-09-21 12:57:29 INFO [chan bzr SocketAsChannelAdapter] Opened
> sftp connection (server version 3)
> 2014-09-21 12:57:30 INFO Importing branch.
> 2014-09-21 12:57:32 INFO Counting objects: 2724, done. 0
> 2014-09-21 12:57:34 INFO finding revisions to fetch:generating index
> 0/2724
> 2014-09-21 12:57:36 INFO finding revisions to fetch:generating index
> 0/2724
> 2014-09-21 12:57:36 INFO finding revisions to fetch 1/608
> 2014-09-21 12:57:36 INFO
> Traceback (most recent call last):
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/scripts/code-import-worker.py",
> line 95, in <module>
> sys.exit(script.main())
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/scripts/code-import-worker.py",
> line 90, in main
> return import_worker.run()
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/lib/lp/codehosting/codeimport/worker.py",
> line 576, in run
> return self._doImport()
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/lib/lp/codehosting/codeimport/worker.py",
> line 730, in _doImport
> inter_branch.fetch(limit=revision_limit)
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/branch.py",
> line 722, in fetch
> self.fetch_objects(stop_revision, fetch_tags=fetch_tags, limit=limit)
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/branch.py",
> line 745, in fetch_objects
> determine_wants, self.source.mapping, limit=limit)
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/fetch.py",
> line 718, in fetch_objects
> limit)
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/fetch.py",
> line 484, in import_git_objects
> mapping.revision_id_foreign_to_bzr)
> File "/srv/
> importd.launchpad.net/production/launchpad-rev-17196/bzrplugins/git/mapping.py",
> line 334, in import_commit
> raise UnknownCommitExtra(commit, [item[0] for item in commit.extra])
> bzrlib.plugins.git.errors.UnknownCommitExtra: Unknown extra fields in
> <Commit 56ead206c93f1ed9d6f1d46cbfa2a8e79cdad63c>: ['HG:rename',
> 'HG:rename', 'HG:rename', 'HG:renam...

Read more...

Jelmer Vernooij (jelmer)
Changed in bzr-git:
assignee: Jelmer Vernooij (jelmer) → nobody
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 1372149] [NEW] bzr-git does not preserve or ignore unknown extra fields

On Tue, Oct 21, 2014 at 04:04:53AM -0000, Launchpad Bug Tracker wrote:
> ** Affects: bzr-git
> Importance: Undecided
> Assignee: Jelmer Vernooij (jelmer)
> Status: Confirmed

Please don't assign bzr-git bugs to me.

I am no longer working on bzr-git; it is currently unmaintained.

That said, I still think this is primarily a hg-git issue.

Jelmer

Revision history for this message
Susan Spencer (susan-spencer) wrote :

Hi Jelmer,

Since no one is supporting bzr tools anymore,
that leaves projects like mine (Valentina)
permanently broken in Launchpad.

The hg-git "issue" is that hg-git is now pulling *all* fields from hg
into git, which isn't a bug, this is considered to be an improvement.

According to the hg-git team, Dulwich either ignores
or preserves additional fields that it doesn't recognize.
The expectation is that bzr-git, which uses Dulwich,
should also either ignore or preserve these fields, but
instead bzr-git just crashes.

It's too bad that Canonical has stopped supporting bzr tools.
I heard a rumour that there was discussion about adding
support for git on Launchpad to remove dependence on
the dead bzr tools, but that this effort was not well received.
What a shame.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Susan,

See this thread, which sprung up on the Git list after I added support for extra commit headers in Dulwich: http://git.661346.n2.nabble.com/extra-headers-in-commit-objects-td4508608.html

The consensus among Git developers seems to be that defining new header fields should be left up to the Git developers. What hg-git is doing goes against that.

bzr-git doesn't completely adhere to the guidelines in that thread either since it doesn't just drop unknown fields. There are good reasons it can't just ignore headers it doesn't know about though. See my earlier comments for a more thorough explanation:

Throwing an error when unknown fields are encountered is intended behaviour. bzr-git can't just drop unknown metadata since that makes it impossible to reserialize the original Git objects later. Reserializing is necessary since the original git objects are used as delta bases.

bzr-git could store all unknown headers in an extra metadata field. We don't do this since it means we can't change the mapping of any unknown headers later without breaking all existing bzr-git users. Instead, since git commit header extensions are so rare, we require explicit support for them in bzr-git.

Dulwich is used for parsing and creating Git objects in bzr-git; bzr-git doesn't use Dulwich to store data, it uses native Bazaar objects.

Is your project itself maintained in Git or in Mercurial? If it is in Mercurial, you could disable the setting of extra headers in hg-git or alternatively run a cronjob somewhere that uses hg-fastexport and bzr-fastimport to create a bzr branch from a mercurial branch.

Revision history for this message
Susan Spencer (susan-spencer) wrote : Re: [Bug 1372149] Re: bzr-git does not preserve or ignore unknown extra fields
Download full text (5.9 KiB)

Hi Jelmer,

AFAIK There isn't a parameter for users to disable extra headers when
importing code into Github.
https://porter.github.com/new

Am I correct in understanding that you added the functionality in Dulwich
to ignore extra headers so it wouldn't crash when encountering extra
headers?
http://git.661346.n2.nabble.com/extra-headers-in-commit-objects-td4508608.html
Could you add this functionality to bzr-git, so that bzr-git would be
compliant
with the C git rule to ignore extra headers that aren't recognized, after
the encoding header?

On Tue, Oct 21, 2014 at 9:21 PM, Jelmer Vernooij <<email address hidden>
> wrote:

> Hi Susan,
>
> See this thread, which sprung up on the Git list after I added support
> for extra commit headers in Dulwich: http://git.661346.n2.nabble.com
> /extra-headers-in-commit-objects-td4508608.html
>
> The consensus among Git developers seems to be that defining new header
> fields should be left up to the Git developers. What hg-git is doing
> goes against that.
>
> bzr-git doesn't completely adhere to the guidelines in that thread
> either since it doesn't just drop unknown fields. There are good reasons
> it can't just ignore headers it doesn't know about though. See my
> earlier comments for a more thorough explanation:
>
> Throwing an error when unknown fields are encountered is intended
> behaviour. bzr-git can't just drop unknown metadata since that makes it
> impossible to reserialize the original Git objects later. Reserializing
> is necessary since the original git objects are used as delta bases.
>
> bzr-git could store all unknown headers in an extra metadata field. We
> don't do this since it means we can't change the mapping of any unknown
> headers later without breaking all existing bzr-git users. Instead,
> since git commit header extensions are so rare, we require explicit
> support for them in bzr-git.
>
> Dulwich is used for parsing and creating Git objects in bzr-git; bzr-git
> doesn't use Dulwich to store data, it uses native Bazaar objects.
>
> Is your project itself maintained in Git or in Mercurial? If it is in
> Mercurial, you could disable the setting of extra headers in hg-git or
> alternatively run a cronjob somewhere that uses hg-fastexport and bzr-
> fastimport to create a bzr branch from a mercurial branch.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1372149
>
> Title:
> bzr-git does not preserve or ignore unknown extra fields
>
> Status in bzr git support plugin:
> Confirmed
> Status in Valentina. Pattern making program.:
> New
> Status in “valentina” package in Ubuntu:
> New
>
> Bug description:
> I can't import source code from git. Here log:
> 2014-09-21 12:57:19 INFO Starting job.
> 2014-09-21 12:57:19 INFO Getting exising bzr branch from central
> store.
> 2014-09-21 12:57:20 INFO [chan bzr SocketAsChannelAdapter] Opened
> sftp connection (server version 3)
> 2014-09-21 12:57:20 INFO 35 bytes transferred
> 2014-09-21 12:57:29 INFO [chan bzr SocketAsChannelAdapter] Opened
> sftp connection (server version 3)
> 2014-09-21 12:57:30 INFO Importing branch.
...

Read more...

Revision history for this message
Jelmer Vernooij (jelmer) wrote :
Download full text (6.8 KiB)

On Wed, Oct 22, 2014 at 11:03:17AM -0500, Susan Spencer wrote:
> AFAIK There isn't a parameter for users to disable extra headers when
> importing code into Github.
> https://porter.github.com/new
It might not be exposing the hg-git settings if it uses hg-git underneat. :-/

> Am I correct in understanding that you added the functionality in Dulwich
> to ignore extra headers so it wouldn't crash when encountering extra
> headers?
> http://git.661346.n2.nabble.com/extra-headers-in-commit-objects-td4508608.html
Yes.

> Could you add this functionality to bzr-git, so that bzr-git would be
> compliant
> with the C git rule to ignore extra headers that aren't recognized, after
> the encoding header?
See my earlier answers. We can't do that, because bzr-git needs to
be able to reproduce the full contents of a git object that it imports. If
it dropped any fields, that would be impossible.

Jelmer

> On Tue, Oct 21, 2014 at 9:21 PM, Jelmer Vernooij <<email address hidden>
> > wrote:
>
> > Hi Susan,
> >
> > See this thread, which sprung up on the Git list after I added support
> > for extra commit headers in Dulwich: http://git.661346.n2.nabble.com
> > /extra-headers-in-commit-objects-td4508608.html
> >
> > The consensus among Git developers seems to be that defining new header
> > fields should be left up to the Git developers. What hg-git is doing
> > goes against that.
> >
> > bzr-git doesn't completely adhere to the guidelines in that thread
> > either since it doesn't just drop unknown fields. There are good reasons
> > it can't just ignore headers it doesn't know about though. See my
> > earlier comments for a more thorough explanation:
> >
> > Throwing an error when unknown fields are encountered is intended
> > behaviour. bzr-git can't just drop unknown metadata since that makes it
> > impossible to reserialize the original Git objects later. Reserializing
> > is necessary since the original git objects are used as delta bases.
> >
> > bzr-git could store all unknown headers in an extra metadata field. We
> > don't do this since it means we can't change the mapping of any unknown
> > headers later without breaking all existing bzr-git users. Instead,
> > since git commit header extensions are so rare, we require explicit
> > support for them in bzr-git.
> >
> > Dulwich is used for parsing and creating Git objects in bzr-git; bzr-git
> > doesn't use Dulwich to store data, it uses native Bazaar objects.
> >
> > Is your project itself maintained in Git or in Mercurial? If it is in
> > Mercurial, you could disable the setting of extra headers in hg-git or
> > alternatively run a cronjob somewhere that uses hg-fastexport and bzr-
> > fastimport to create a bzr branch from a mercurial branch.
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1372149
> >
> > Title:
> > bzr-git does not preserve or ignore unknown extra fields
> >
> > Status in bzr git support plugin:
> > Confirmed
> > Status in Valentina. Pattern making program.:
> > New
> > Status in “valentina” package in Ubuntu:
> > New
> >
> > Bug description:
> > I can't import source code from gi...

Read more...

Roman (dismine)
Changed in valentina:
status: New → Fix Released
Changed in valentina (Ubuntu):
status: New → Fix Released
Changed in valentina:
importance: Undecided → Critical
Jelmer Vernooij (jelmer)
no longer affects: valentina (Ubuntu)
Revision history for this message
Jendrik Seipp (jendrikseipp) wrote :

This issue prevents code imports for me as well (https://code.launchpad.net/~jendrikseipp/rednotebook/github-mirror). Is there some workaround that I can use?

Revision history for this message
Roman (dismine) wrote :

Jendrik, do you use mercurial?

Revision history for this message
Jendrik Seipp (jendrikseipp) wrote :

Apparently, one contributor used mercurial for pushing his changes.

Revision history for this message
Roman (dismine) wrote :

This mean he probably used extension hg-git. I had the same issue with export. How i discovered you can't use use hg-git higher than version 0.6.1.

In my case i use mercurial->git->launchpad. So for restoring my repository i completely deleted it and made export with hg-git v0.6.1. Export works until now without errors.

Revision history for this message
Jendrik Seipp (jendrikseipp) wrote :

Thanks this info. However, I think it would be nice if there was a
solution that didn't involve deleting the public github repo. Is there
any chance this might be handled by the git importer?

Revision history for this message
Roman (dismine) wrote :
Revision history for this message
Jendrik Seipp (jendrikseipp) wrote :

Ah, yes, I forgot about mirroring the git repo on launchpad. Thanks,
that should work.

Jelmer Vernooij (jelmer)
Changed in brz-git:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I'm marking this as invalid, since:

* Ignoring these fields results in inability to do incremental pulls from Git
* Just preserving them is probably not the right way to go going forward, e.g. we may want to convert them to some specific metadata field in bzr

Feel free to file a separate bug for specific fields (e.g. HG:extra) if you'd like to those be supported.

Changed in brz-git:
status: Triaged → Invalid
Changed in bzr-git:
status: Confirmed → Invalid
Jelmer Vernooij (jelmer)
affects: brz-git → brz
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions

Remote bug watches

  • auto-dismine-1 Edit

Bug watches keep track of this bug in other bug trackers.