git performance suffers for large repositories

Bug #829298 reported by Davi Arnaut
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar Git Plugin
Triaged
Medium
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

bzr dpush of a large repository such as lp:mysql-server/5.5 can run for days.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

bzr-git's push support isn't really yet on par with its fetch support, which works reasonable well for large repositories that have the size of mysql or the kernel.

This is partly due to a lack of support for generating proper git pack files in dulwich (bug 562673) and partly due to the difference in the file formats of bzr and git. Some sort of cache (not fulltext, but git sha -> bzr object identifier) might help iterating over git objects based on a bzr repository more quickly.

Changed in bzr-git:
status: New → Triaged
importance: Undecided → Medium
importance: Medium → High
importance: High → Medium
Revision history for this message
Davi Arnaut (davi) wrote :

This makes it practically impossible to have a viable/updateable fast-export git repository of mysql as bzr fastexport does not seem to properly hand all revisions.

For what it's worth, I also tested the branch proposed in bug 562673 and it does not seem to yield any perceptible improvement.

Revision history for this message
Davi Arnaut (davi) wrote :

s/updateable fast-export/updateable/

Revision history for this message
Davi Arnaut (davi) wrote :

It is still running after 80 hours and the "pushing revisions" phase has decided to start using 25GB of memory.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

The branch for bug 562673 only contains some refactoring that is necessary for the fix, it doesn't actually contain the fix itself yet.

bzr-git's push support isn't really ready for large scale pushes yet because of bug 562673. It's mainly useful for contributing back to upstream branches that are in git (incremental pushes work fine).

What issue are you hitting with bzr-fastexport?

Revision history for this message
Davi Arnaut (davi) wrote :

There is a error (I guess in git-fastimport) about not being able to update ref/master (or something) due to a missing commit. But I don't want to keep the "marks" files around and I also want to make repeatable/stable conversions, which (it seems) bzr-fastexport is not capable of.

Revision history for this message
Davi Arnaut (davi) wrote :

> It's mainly useful for contributing back to upstream branches that are in git (incremental pushes work fine).

I'm attempting to accomplish something similar, have a downstream branch in git where people can pull/merge from the upstream in bzr.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 829298] Re: bzr-git performance suffers for large repositories

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/23/2011 10:23 PM, Davi Arnaut wrote:
>> It's mainly useful for contributing back to upstream branches
>> that are
> in git (incremental pushes work fine).
>
> I'm attempting to accomplish something similar, have a downstream
> branch in git where people can pull/merge from the upstream in
> bzr.
>

You could try doing the push incrementally, rather than the whole
history from the beginning. Depending on how bzr-git handles
incremental updates, you could push say 1000 revisions at a time.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5VHIsACgkQJdeBCYSNAAMdxgCgir7LZPXDpbAvJnzhhtRCEb2V
D/wAoLPDXCYSkQrsY4e5nWgIWwECO0C7
=6vDk
-----END PGP SIGNATURE-----

Revision history for this message
William Grant (wgrant) wrote : Re: bzr-git performance suffers for large repositories

A significant part of the slowness (but not the space-inefficiency) is that blobs only enter the idmap when there's a content_changed event. If an InventoryFile shows up with a new revision but without new content, it can cause the file to be rehashed in every subsequent commit until the content actually changes.

Hacking the fallback in _tree_to_objects to forcibly cache any misses makes things several times faster on long complex histories.

Jelmer Vernooij (jelmer)
Changed in brz-git:
status: New → Triaged
importance: Undecided → Medium
Jelmer Vernooij (jelmer)
tags: added: performance
Jelmer Vernooij (jelmer)
summary: - bzr-git performance suffers for large repositories
+ git performance suffers for large repositories
tags: added: git
affects: brz-git → brz
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.