pulling from dirstate-with-subtree into rich-root repo surprisingly slow

Bug #174627 reported by Dato Simó
2
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
Unassigned

Bug Description

(This comes from http://bugs.debian.org/454504, and I can confirm it myself.)

-8<-
~roland/debian/bzr-repo/gforge/ holds my main bzr repository, stored
as dirstate-with-subtree with no working trees. Trying to clone
branches stored in there gives wildly varying performances depending
on the format of the destination repository. In particular, rich-root
is *slow*:

guest@mirexpress:~/repo/gforge-rich-root$ bzr init-repo --no-trees --format=rich-root .
guest@mirexpress:~/repo/gforge-rich-root$ time bzr branch ~roland/debian/bzr-repo/gforge/debian/sid/ sid
Branched 5068 revision(s).

real 108m52.212s
user 89m24.975s
sys 1m40.582s

[...]

guest@mirexpress:~/repo/gforge-pack$ bzr init-repo --no-trees --format=pack-0.92-subtree .
guest@mirexpress:~/repo/gforge-pack$ time bzr branch ~roland/debian/bzr-repo/gforge/debian/sid/
Branched 5068 revision(s).

real 5m30.865s
user 2m31.997s
sys 0m11.425s

[...]

guest@mirexpress:~/repo/gforge-dirstate$ bzr init-repo --no-trees --format=dirstate-with-subtree .
guest@mirexpress:~/repo/gforge-dirstate$ time bzr branch ~roland/debian/bzr-repo/gforge/upstream-svn/trunk/
Branched 5068 revision(s).

real 4m36.514s
user 1m19.449s
sys 0m13.533s

->8-

 affects /products/bzr

Revision history for this message
Aaron Bentley (abentley) wrote : Re: [Bug 174627] Large performance regression when pulling from dirstate-with-subtree into rich-root repo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adeodato Simó wrote:
> Public bug reported:
>
> (This comes from http://bugs.debian.org/454504, and I can confirm it
> myself.)

This is not a regression-- this is the fastest we've ever been able to
pull from dirstate-with-subtree into a rich-root repo.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWUwA0F+nu1YWqI0RAtdyAJ9onr4LmDeSHsAOducwYbXtdkXpLgCdEc8p
b1lYZUmPxe9cwZsIOFI1kag=
=GKK0
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote : Re: Large performance regression when pulling from dirstate-with-subtree into rich-root repo

Why is it this slow?

Is it because the source format is 7 and the target format is 6? So it is rebuilding all of the inventories during pull?

I'll correct the title, though.

Revision history for this message
John A Meinel (jameinel) wrote :

This is mostly an issue for people who have been using bzr-svn, and then migrate through the new release. I don't know of any other users of --dirstate-with-subtree.

It also points out that we should probably be trying to make rich-root-pack the default format. It takes a bit more to upgrade from dirstate-tags (again because of rebuilding the inventory files). But it would help if we could get away from having multiple disk formats concurrently recommended. (By default we encourage pack-0.92, but bzr-svn recommends rich-root/rich-root-pack).

Changed in bzr:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Aaron Bentley (abentley) wrote :

Actually, I'm not sure which converter it's using. I think mine doesn't allow this conversion, because it won't always succeed.

Revision history for this message
Andrew Bennetts (spiv) wrote :

I have a few bzr-svn trees I want to convert from dirstate-with-subtree to rich-root-pack, so I'm also wishing this was faster. On converting my import of python trunk, I'm seeing worse than 1 revision/sec, which seems pretty terrible.

One cheap (but slight) improvement is this patch:

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- bzrlib/repofmt/pack_repo.py 2008-02-19 03:58:32 +0000
+++ bzrlib/repofmt/pack_repo.py 2008-02-27 07:24:26 +0000
@@ -1753,6 +1752,7 @@
         self.weavestore = weavestore
         # XXX for check() which isn't updated yet
         self._transport = weavestore._transport
+ self._transport_parent_dir = transport.clone('..')

     def get_weave_or_empty(self, file_id, transaction):
         """Get a 'Knit' backed by the .tix indices.
@@ -1769,7 +1769,7 @@
             add_callback=file_id_index.add_nodes,
             deltas=True, parents=True)
         return knit.KnitVersionedFile('text:' + file_id,
- self.transport.clone('..'),
+ self._transport_parent_dir,
             None,
             index=knit_index,
             access_method=self.repo._pack_collection.text_index.knit_access,

That avoids many identical clones of a local transport, which is a little bit expensive.

Another thing that pops up in my profiling (20% according to lsprof/kcachegrind) is "has_version". That seems pretty high, I wonder if maybe newly added revisions aren't having their metadata cached? What I'm suspecting here is that as it converts an essentially linear history that it might be doing:
  * add revision X
  * check has_revision(parents of X)
    * first revision, so has to hit disk
  * then add revision X+1 (which has X as the parent)
  * check has_revision(X)
    * even though we just added X, maybe we still hit disk here?
  * rinse and repeat...

I tried to verify this theory, but I don't know my way around the various index layers well enough to know if I'm looking in the right place.

Even if that's true, a 20% improvement on 1 per second is still pretty slow when trying to convert 30000 revisions.

Revision history for this message
Robert Collins (lifeless) wrote :

We've heavily optimised cross format conversions during the 2.0 development cycle.

Changed in bzr:
status: Triaged → Fix Released
milestone: none → 2.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.