too much data transferred making a new stacked branch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
High
|
John A Meinel | ||
2.3 |
Fix Released
|
High
|
John A Meinel | ||
bzr (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Natty |
Fix Released
|
High
|
Jelmer Vernooij |
Bug Description
In thread "Linaro bzr feedback" John writes:
Note, I just did 'bzr branch lp:gcc-linaro', and it transferred about
500MB, about 457MB on disk. (Not bad considering lp:emacs transferred
400-500MB and was only 200MB on disk.)
I then ran 'bzr serve' and 'bzr branch --stacked bzr://localhost
What was scary was:
8141442kB 24128kB/s / Finding Revisions
...
> Grepping the .bzr.log file in question, I do, indeed see about 8.1GB of
> data transferred before we read the first .tix.
> If my grep fu is strong, then we only read 30MB of .cix data. Which
> leaves us with 8GB of .pack content, or actual CHK page content.
This is a change which drops the 8GB down to 150MB:
=== modified file 'bzrlib/
- --- bzrlib/inventory.py 2010-09-14 13:12:20 +0000
+++ bzrlib/inventory.py 2011-03-17 15:38:40 +0000
@@ -736,6 +736,13 @@
# TODO? Perhaps this should return the from_dir so that the root is
# yielded? or maybe an option?
+ if from_dir is None and specific_file_ids is None:
+ # They are iterating from the root, assume they are iterating
+ # everything and preload all file_ids into the
+ # _fileid_
.children
+ # for each directory, but that will happen later.
+ for _ in self.iter_
+ continue
if from_dir is None:
if self.root is None:
Basically, iter_entries_by_dir goes in a specific order which doesn't
match the order in the repository. 'iter_just_entries' loads everything
in repository order, and puts it into the
CHKInventory.
fed from there.
We don't usually notice this effect, because of the
chk_map.
Once the inventory is large enough to not be in the bytes cache, we have
to load it from the repository again.
I just checked, and this also has a large effect for local repositories.
'time list(rev_
drops from 4m30s down to 13s with the patch.
So we certainly should think about other ramifications, but short term
it looks quite good.
Related branches
- Jelmer Vernooij (community): Approve (code)
- Vincent Ladeuil: Approve
-
Diff: 224 lines (+158/-2)3 files modifiedbzrlib/inventory.py (+69/-2)
bzrlib/tests/test_inv.py (+82/-0)
doc/en/release-notes/bzr-2.4.txt (+7/-0)
- bzr-core: Pending requested
-
Diff: 223 lines (+158/-2)3 files modifiedbzrlib/inventory.py (+69/-2)
bzrlib/tests/test_inv.py (+82/-0)
doc/en/release-notes/bzr-2.3.txt (+7/-0)
Changed in bzr: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → John A Meinel (jameinel) |
tags: | added: affects-linaro performance stacking |
Changed in bzr: | |
status: | In Progress → Fix Released |
milestone: | none → 2.4b2 |
Changed in bzr (Ubuntu): | |
status: | New → Fix Released |
Changed in bzr (Ubuntu Natty): | |
status: | New → In Progress |
Changed in bzr (Ubuntu Natty): | |
importance: | Undecided → High |
assignee: | nobody → Jelmer Vernooij (jelmer) |
I applied the patch to bzr v2.3.1 and I can verify that the issue I was facing is fixed. Both lightweight checkout and stacked branch for lp:gcc now download ~540MB over only 20min, a huge improvement (before it was 8GB over 5h) which makes these operations truly lightweight over network. Thank you for the fix.
The directories downloaded in both cases are 625MB size as reported by du. So I don't see an issue about using these operations over network any more.