Bazaar

pulling from dirstate-with-subtree into rich-root repo surprisingly slow

Bug #174627 reported by Dato Simó on 2007-12-07

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Fix Released	Medium	Unassigned	Bazaar 2.0.0 "Instant Karma"

Bug Description

(This comes from http://bugs.debian.org/454504, and I can confirm it myself.)

-8<-
~roland/debian/bzr-repo/gforge/ holds my main bzr repository, stored
as dirstate-with-subtree with no working trees. Trying to clone
branches stored in there gives wildly varying performances depending
on the format of the destination repository. In particular, rich-root
is *slow*:

guest@mirexpress:~/repo/gforge-rich-root$ bzr init-repo --no-trees --format=rich-root .
guest@mirexpress:~/repo/gforge-rich-root$ time bzr branch ~roland/debian/bzr-repo/gforge/debian/sid/ sid
Branched 5068 revision(s).

real 108m52.212s
user 89m24.975s
sys 1m40.582s

[...]

guest@mirexpress:~/repo/gforge-pack$ bzr init-repo --no-trees --format=pack-0.92-subtree .
guest@mirexpress:~/repo/gforge-pack$ time bzr branch ~roland/debian/bzr-repo/gforge/debian/sid/
Branched 5068 revision(s).

real 5m30.865s
user 2m31.997s
sys 0m11.425s

[...]

guest@mirexpress:~/repo/gforge-dirstate$ bzr init-repo --no-trees --format=dirstate-with-subtree .
guest@mirexpress:~/repo/gforge-dirstate$ time bzr branch ~roland/debian/bzr-repo/gforge/upstream-svn/trunk/
Branched 5068 revision(s).

real 4m36.514s
user 1m19.449s
sys 0m13.533s

->8-

affects /products/bzr

Revision history for this message

Aaron Bentley (abentley) wrote on 2007-12-07: Re: [Bug 174627] Large performance regression when pulling from dirstate-with-subtree into rich-root repo

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adeodato Simó wrote:
> Public bug reported:
>
> (This comes from http://bugs.debian.org/454504, and I can confirm it
> myself.)

This is not a regression-- this is the fastest we've ever been able to
pull from dirstate-with-subtree into a rich-root repo.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWUwA0F+nu1YWqI0RAtdyAJ9onr4LmDeSHsAOducwYbXtdkXpLgCdEc8p
b1lYZUmPxe9cwZsIOFI1kag=
=GKK0
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2007-12-07: Re: Large performance regression when pulling from dirstate-with-subtree into rich-root repo

Why is it this slow?

Is it because the source format is 7 and the target format is 6? So it is rebuilding all of the inventories during pull?

I'll correct the title, though.

Revision history for this message

John A Meinel (jameinel) wrote on 2007-12-07:

This is mostly an issue for people who have been using bzr-svn, and then migrate through the new release. I don't know of any other users of --dirstate-with-subtree.

It also points out that we should probably be trying to make rich-root-pack the default format. It takes a bit more to upgrade from dirstate-tags (again because of rebuilding the inventory files). But it would help if we could get away from having multiple disk formats concurrently recommended. (By default we encourage pack-0.92, but bzr-svn recommends rich-root/rich-root-pack).

Changed in bzr:
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

Aaron Bentley (abentley) wrote on 2007-12-07:

Actually, I'm not sure which converter it's using. I think mine doesn't allow this conversion, because it won't always succeed.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2008-02-27:

I have a few bzr-svn trees I want to convert from dirstate-with-subtree to rich-root-pack, so I'm also wishing this was faster. On converting my import of python trunk, I'm seeing worse than 1 revision/sec, which seems pretty terrible.

One cheap (but slight) improvement is this patch:

     def get_weave_or_empty(self, file_id, transaction):
         """Get a 'Knit' backed by the .tix indices.
@@ -1769,7 +1769,7 @@
             add_callback=file_id_index.add_nodes,
             deltas=True, parents=True)
         return knit.KnitVersionedFile('text:' + file_id,
- self.transport.clone('..'),
+ self._transport_parent_dir,
             None,
             index=knit_index,
             access_method=self.repo._pack_collection.text_index.knit_access,

That avoids many identical clones of a local transport, which is a little bit expensive.

Another thing that pops up in my profiling (20% according to lsprof/kcachegrind) is "has_version". That seems pretty high, I wonder if maybe newly added revisions aren't having their metadata cached? What I'm suspecting here is that as it converts an essentially linear history that it might be doing:
  * add revision X
  * check has_revision(parents of X)
    * first revision, so has to hit disk
  * then add revision X+1 (which has X as the parent)
  * check has_revision(X)
    * even though we just added X, maybe we still hit disk here?
  * rinse and repeat...

I tried to verify this theory, but I don't know my way around the various index layers well enough to know if I'm looking in the right place.

Even if that's true, a 20% improvement on 1 per second is still pretty slow when trying to convert 30000 revisions.

I have a few bzr-svn trees I want to convert from dirstate-with-subtree to rich-root-pack, so I'm also wishing this was faster.  On converting my import of python trunk, I'm seeing worse than 1 revision/sec, which seems pretty terrible.

One cheap (but slight) improvement is this patch:

=== modified file 'bzrlib/repofmt/pack_repo.py'
--- bzrlib/repofmt/pack_repo.py 2008-02-19 03:58:32 +0000
+++ bzrlib/repofmt/pack_repo.py 2008-02-27 07:24:26 +0000
@@ -1753,6 +1752,7 @@
         self.weavestore = weavestore
         # XXX for check() which isn't updated yet
         self._transport = weavestore._transport
+        self._transport_parent_dir = transport.clone('..')
 
     def get_weave_or_empty(self, file_id, transaction):
         """Get a 'Knit' backed by the .tix indices.
@@ -1769,7 +1769,7 @@
             add_callback=file_id_index.add_nodes,
             deltas=True, parents=True)
         return knit.KnitVersionedFile('text:' + file_id,
-            self.transport.clone('..'),
+            self._transport_parent_dir,
             None,
             index=knit_index,
             access_method=self.repo._pack_collection.text_index.knit_access,

That avoids many identical clones of a local transport, which is a little bit expensive.

Another thing that pops up in my profiling (20% according to lsprof/kcachegrind) is "has_version".  That seems pretty high, I wonder if maybe newly added revisions aren't having their metadata cached?  What I'm suspecting here is that as it converts an essentially linear history that it might be doing:
  * add revision X
  * check has_revision(parents of X)
    * first revision, so has to hit disk
  * then add revision X+1  (which has X as the parent)
  * check has_revision(X)
    * even though we just added X, maybe we still hit disk here?
  * rinse and repeat...

I tried to verify this theory, but I don't know my way around the various index layers well enough to know if I'm looking in the right place.

Even if that's true, a 20% improvement on 1 per second is still pretty slow when trying to convert 30000 revisions.

Revision history for this message

Robert Collins (lifeless) wrote on 2009-09-30:

We've heavily optimised cross format conversions during the 2.0 development cycle.

Changed in bzr:
status:	Triaged → Fix Released
milestone:	none → 2.0.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #454504
[done normal] Edit

Bug watches keep track of this bug in other bug trackers.