too much data transferred making a new stacked branch

Bug #737234 reported by Martin Pool on 2011-03-17
This bug affects 2 people
Affects Status Importance Assigned to Milestone
John A Meinel
John A Meinel
bzr (Ubuntu)
Jelmer Vernooij

Bug Description

In thread "Linaro bzr feedback" John writes:

Note, I just did 'bzr branch lp:gcc-linaro', and it transferred about
500MB, about 457MB on disk. (Not bad considering lp:emacs transferred
400-500MB and was only 200MB on disk.)

I then ran 'bzr serve' and 'bzr branch --stacked bzr://localhost:...'.
What was scary was:

8141442kB 24128kB/s / Finding Revisions
> Grepping the .bzr.log file in question, I do, indeed see about 8.1GB of
> data transferred before we read the first .tix.
> If my grep fu is strong, then we only read 30MB of .cix data. Which
> leaves us with 8GB of .pack content, or actual CHK page content.

This is a change which drops the 8GB down to 150MB:

=== modified file 'bzrlib/'
- --- bzrlib/ 2010-09-14 13:12:20 +0000
+++ bzrlib/ 2011-03-17 15:38:40 +0000
@@ -736,6 +736,13 @@
            specific_file_ids = set(specific_file_ids)
        # TODO? Perhaps this should return the from_dir so that the root is
        # yielded? or maybe an option?
+ if from_dir is None and specific_file_ids is None:
+ # They are iterating from the root, assume they are iterating
+ # everything and preload all file_ids into the
+ # _fileid_to_entry_cache. This doesn't build things into
+ # for each directory, but that will happen later.
+ for _ in self.iter_just_entries():
+ continue
        if from_dir is None:
            if self.root is None:

Basically, iter_entries_by_dir goes in a specific order which doesn't
match the order in the repository. 'iter_just_entries' loads everything
in repository order, and puts it into the
CHKInventory._file_id_entry_cache, and then the rest of the requests are
fed from there.

We don't usually notice this effect, because of the
chk_map._thread_caches.page_cache and the GCCHKRepository block cache.
Once the inventory is large enough to not be in the bytes cache, we have
to load it from the repository again.

I just checked, and this also has a large effect for local repositories.

'time list(rev_tree.inventory.iter_entries_by_dir())'
drops from 4m30s down to 13s with the patch.

So we certainly should think about other ramifications, but short term
it looks quite good.

Related branches

Martin Pool (mbp) on 2011-03-17
Changed in bzr:
status: New → In Progress
importance: Undecided → High
assignee: nobody → John A Meinel (jameinel)
tags: added: affects-linaro performance stacking
John A Meinel (jameinel) on 2011-03-23
Changed in bzr:
status: In Progress → Fix Released
milestone: none → 2.4b2
Dimitrios Apostolou (jimis) wrote :

I applied the patch to bzr v2.3.1 and I can verify that the issue I was facing is fixed. Both lightweight checkout and stacked branch for lp:gcc now download ~540MB over only 20min, a huge improvement (before it was 8GB over 5h) which makes these operations truly lightweight over network. Thank you for the fix.

The directories downloaded in both cases are 625MB size as reported by du. So I don't see an issue about using these operations over network any more.

Jelmer Vernooij (jelmer) on 2011-06-08
Changed in bzr (Ubuntu):
status: New → Fix Released
Jelmer Vernooij (jelmer) on 2011-06-08
Changed in bzr (Ubuntu Natty):
status: New → In Progress
Jelmer Vernooij (jelmer) on 2011-06-10
Changed in bzr (Ubuntu Natty):
importance: Undecided → High
assignee: nobody → Jelmer Vernooij (jelmer)

Accepted bzr into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See for documentation how to enable and use -proposed. Thank you in advance!

Changed in bzr (Ubuntu Natty):
status: In Progress → Fix Committed
tags: added: verification-needed
Clint Byrum (clint-fewbar) wrote :

Hello Martin, or anyone else affected,

Accepted bzr into natty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See for documentation how to enable and use -proposed. Thank you in advance!

Jelmer Vernooij (jelmer) wrote :

Verified by running the bzr testsuite from the package in a clean natty install.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bzr - 2.3.4-0ubuntu1

bzr (2.3.4-0ubuntu1) natty-proposed; urgency=low

  * New upstream release.
   + Fix bzr version number in deprecation warnings. LP: #794960
   + Prevent write attemps on remote branch during "bzr up". LP: #786980
   + Fix conflict handling when two trees involved in a merge have different
     root ids. LP: #805809

bzr (2.3.3-0ubuntu1) natty-proposed; urgency=low

  * New upstream release.
   + Fixes deprecation warning on newer versions of Python. LP: #760435
   + Stops 'bzr push' from copying entire repository if a .bzr directory is
     present without a branch. LP: #465517
   + Fixes undefined local variable error when waiting for lock. LP: #733136
   + Fixes lock contention issues pushing to a bound branch. LP: #733350
   + Transfers less data creating a new stacked branch. LP: #737234
   + Several fixes to the test suite, making it more robust. LP: #654733,
      LP: #751824
   + 'bzr merge --pull --preview' actually shows a preview rather than
     actually merging. LP: #760152
   + bzr smart server now supports UTF-8 user names. LP: #659763
   + user identity can now be set based on username and /etc/mailname, not
     requiring it to be set manually. LP: #616878
   + stacking is now fully transitive. LP: #715000
   + makes in-terminal crash report of plugins much shorter. LP: #716389
 -- Jelmer Vernooij <email address hidden> Thu, 14 Jul 2011 21:12:58 +0200

Changed in bzr (Ubuntu Natty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers