local clone without shared repository is too slow

Bug #116094 reported by SoloTurn
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

the number of files is approx the same. the time to clone a repository takes 60 times longer with bzr-0.16 than with hg-0.93.

$ time bzr clone bzr.dev/ bzr.dev2
Branched 2488 revision(s).

real 1m31.447s
user 0m13.230s
sys 0m1.490s

$ time hg clone hg-head hg-head2
528 files updated, 0 files merged, 0 files removed, 0 files unresolved

real 0m1.220s
user 0m0.560s
sys 0m0.170s

Tags: performance
Revision history for this message
SoloTurn (soloturn) wrote :

mercurial has twice as much revisions, and 15% less files.

Revision history for this message
John A Meinel (jameinel) wrote :

There are things we need to work on, but some of the information here is misleading.

1) What is "bzr clone" time in a shared repository (do
  bzr init-repo --trees .
  bzr branch upstream test1
  time bzr branch test1 test2

I would imaging that we might actually be faster than hg within a shared repository. I would expect us to at least be close.
hg uses hardlinks, which aren't safe when accessed remotely. We've discussed if we can do it in a different way (perhaps
having a flag that this repository isn't remote-access safe, etc).

2) Actually, the way we measure revisions is different. bzr has almost 2.5x the number of revisions. Specifically, bzr distinguishes between mainline revisions (things committed on *this* branch), and merged revisions. bzr has 2487 mainline revisions, and 11326 total revisions.
  You can get that with "bzr revision-history | wc -l" and "bzr ancestry | wc -l" respectively.
  'bzr log' numbers based on mainline revisions.

Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

I have the following bzr repo:

Format:
       control: Meta directory format 1
  working tree: Working tree format 3
        branch: Branch format 5
    repository: Knit repository format 1

In the working tree:
      3393 unchanged
       107 versioned subdirectories

Branch history:
        98 revisions
       124 days old
   first revision: Tue 2007-03-06 12:24:53 +0200
  latest revision: Wed 2007-05-09 20:39:07 +1000

Revision store:
       118 revisions
     18804 KiB

I converted it to hg and git using tailor. I measured the time taken to branch the repo. Every test was done twice and only the last one was measured, so the source repo would be in cache. I also tested cp since bzr docs say you can branch with it as well. Here are the results:

bzr branch 43.21s user 4.98s system 94% cpu 51.174 total

hg clone 4.63s user 1.48s system 93% cpu 6.546 total

git clone 3.68s user 0.86s system 92% cpu 4.933 total

cp 0.16s user 2.09s system 17% cpu 12.811 total

Bzr is clearly the slowest. It should not be noticeably slower than cp.

Tests run on OpenSUSE 10.2, Bzr was version 0.17, hg was 0.9.1 and git was 1.4.3.4.

Revision history for this message
Robert Collins (lifeless) wrote : Its not obvious that bzr branch is safer than hg clone

On Sun, 2007-07-08 at 15:06 +0000, JussiP wrote:
> I have the following bzr repo:

> I converted it to hg and git using tailor. I measured the time taken to
> branch the repo. Every test was done twice and only the last one was
> measured, so the source repo would be in cache. I also tested cp since
> bzr docs say you can branch with it as well. Here are the results:
>
> bzr branch 43.21s user 4.98s system 94% cpu 51.174 total

You are clearly not using a shared repository; shared repositories
provide shared storage between branches and will massively increase the
performance of branch.

> hg clone 4.63s user 1.48s system 93% cpu 6.546 total
>
> git clone 3.68s user 0.86s system 92% cpu 4.933 total
>
> cp 0.16s user 2.09s system 17% cpu 12.811 total
>
> Bzr is clearly the slowest. It should not be noticeably slower than cp.

There are some hidden assumptions here. The first is that bzr branch is
equivalent to cp - its not. bzr branch is performing integrity checking
on the repository as it copies data, which cp does not do (and its
likely your hg clone does not do that either as hg defaults to
hardlinking - which means that we know the entire branch history is
valid, but you don't with hg).

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Jussi Pakkanen (jpakkane) wrote : Re: bzr clone takes 60 times hg clone

I created a shared repository with

bzr init-repo --trees .

created a branch in this directory and then branched that. The result:

bzr branch head test 29.63s user 2.62s system 98% cpu 32.751 total

This gives a noticeable speedup, but bzr still slower than hg and git. Without --trees it is very fast, so populating the source tree seems to be the slow operation

I tested this by copying the .bzr directory to another dir and then issuing 'bzr revert'. Then I did the same with hg. Here are the times:

bzr revert 45.13s user 2.94s system 98% cpu 48.585 total
hg revert 5.25s user 1.21s system 86% cpu 7.493 total

Martin Pool (mbp)
Changed in bzr:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Aaron Bentley (abentley) wrote :

This is somewhat improve by my recent build_tree updates. AIUI, we still lag behind Mercurial, though.

Johan Walles (walles)
tags: added: performance
Revision history for this message
SoloTurn (soloturn) wrote :

cloning launchpad according to the description on http://dev.launchpad.net does never finish - or at least not in a time somebody has patience for (6 hours on a 1.0 ghz solaris box).

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 116094] Re: local clone without shared repository is too slow

2009/8/2 SoloTurn <email address hidden>:
> cloning launchpad according to the description on
> http://dev.launchpad.net does never finish - or at least not in a time
> somebody has patience for (6 hours on a 1.0 ghz solaris box).

Please open a new bug with specific details.

--
Martin <http://launchpad.net/~mbp/>

Changed in bzr:
status: Triaged → Fix Released
Revision history for this message
Robert Collins (lifeless) wrote :

I'm marking this as fix released for the following reasons:
 - comparing to hg one has to use shared repositories - hg hardlinks its database files together, which *doesn't actual clone or validate the data*. The analagous operation in bzr is branching within a shared repository, so that benchmark should be used, and with 2a we're very fast - faster than untarring a bz2 file in some benchmarks we did.
 - comapring to a git clone over the wire is more reasonable, and we've improvd a lot here too - I did a few tests not long ago showing both git and bzr were capped at network speed, where repository size starts to dominate
 - repository size is a big factor, and our 2a format brings that into line with git and hg.

I'm certainly supportive of bugs like this, but we should try to keep them focused, and this has become kindof a meta one, which is why I'm closing it: its not a particularly useful bookmark for developers or users.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote : Re: [Bug 116094] Re: local clone without shared repository is too slow

Robert Collins wrote:
> I'm marking this as fix released for the following reasons:
> - comparing to hg one has to use shared repositories - hg hardlinks its database files together, which *doesn't actual clone or validate the data*. The analagous operation in bzr is branching within a shared repository, so that benchmark should be used, and with 2a we're very fast - faster than untarring a bz2 file in some benchmarks we did.
> - comapring to a git clone over the wire is more reasonable, and we've improvd a lot here too - I did a few tests not long ago showing both git and bzr were capped at network speed, where repository size starts to dominate
> - repository size is a big factor, and our 2a format brings that into line with git and hg.

My desktop is only months old with an i7 920 and 6G of memory. Branching
OOo locally outside a shared repository takes 20 minutes!!! The bug
isn't fixed so we shouldn't claim it is.

We could still do a lot to improve things:

* Disable data validation by default. If the source branch is busted,
  creating each and every feature branch is *not* the time IMO to be
  detecting the problem. We have 'bzr check' and it's the better
  tool for validating the source branch integrity than running some
  checks (but not all) during a branch (and only if the branch isn't
  already in a shared repo).

* Detect that the target isn't in a shared repo and warn the user
  that it will take a long time. Explorer does something like this
  now: it suggests the user create a shared repo and pops up the
  dialog for doing that if they agree.

* Some other UI change so shared repositories are created by default.

> I'm certainly supportive of bugs like this, but we should try to keep
> them focused, and this has become kindof a meta one, which is why I'm
> closing it: its not a particularly useful bookmark for developers or
> users.

I disagree. This problem is a major source of naive users blogging that
"bzr is still slow. I grabbed a branch and cloned it and it took
forever". Our out-of-the-box behaviour, on the command line at least,
leads to terrible performance on the 'create a feature branch' use case.

Ian C.

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (3.4 KiB)

On Mon, 2009-09-28 at 03:13 +0000, Ian Clatworthy wrote:
> Robert Collins wrote:
> > I'm marking this as fix released for the following reasons:
> > - comparing to hg one has to use shared repositories - hg hardlinks its database files together, which *doesn't actual clone or validate the data*. The analagous operation in bzr is branching within a shared repository, so that benchmark should be used, and with 2a we're very fast - faster than untarring a bz2 file in some benchmarks we did.
> > - comapring to a git clone over the wire is more reasonable, and we've improvd a lot here too - I did a few tests not long ago showing both git and bzr were capped at network speed, where repository size starts to dominate
> > - repository size is a big factor, and our 2a format brings that into line with git and hg.
>
> My desktop is only months old with an i7 920 and 6G of memory. Branching
> OOo locally outside a shared repository takes 20 minutes!!! The bug
> isn't fixed so we shouldn't claim it is.

That copies all the history; I think 20 minutes is fine to do that much
work.

The question is 'is 20 minutes appropriate for cloning all the history
of a project the size of OOo' ? And I think it is.

> We could still do a lot to improve things:
>
> * Disable data validation by default. If the source branch is busted,
> creating each and every feature branch is *not* the time IMO to be
> detecting the problem. We have 'bzr check' and it's the better
> tool for validating the source branch integrity than running some
> checks (but not all) during a branch (and only if the branch isn't
> already in a shared repo).

Strongly against this. I think its a terrible idea. We've had terrible
outcomes in the past when we missed a step, and if we're going to change
this we should be adding stricter validation, not discarding it.

> * Detect that the target isn't in a shared repo and warn the user
> that it will take a long time. Explorer does something like this
> now: it suggests the user create a shared repo and pops up the
> dialog for doing that if they agree.

That might be reasonable; certainly the UI focus on working with groups
of branches should be a component here.

> * Some other UI change so shared repositories are created by default.

Or some other mechanism to work with lots of branches in a single repo;
sure.

> > I'm certainly supportive of bugs like this, but we should try to keep
> > them focused, and this has become kindof a meta one, which is why I'm
> > closing it: its not a particularly useful bookmark for developers or
> > users.
>
> I disagree. This problem is a major source of naive users blogging that
> "bzr is still slow. I grabbed a branch and cloned it and it took
> forever". Our out-of-the-box behaviour, on the command line at least,
> leads to terrible performance on the 'create a feature branch' use case.

Agreed. But *this bug* is about the specific 'how long it takes to do a
clone *outside* a shared repository'.

That time is appropriately long, IMO - we've put a lot of work into
making it faster than it was - and your OOo results are a testament to
the effect we've had.

Bugs like 'default setup causes users ...

Read more...

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Robert Collins wrote:

> The question is 'is 20 minutes appropriate for cloning all the history
> of a project the size of OOo' ? And I think it is.

"time cp -a OOo-trunk OOo-fix" takes 1m24secs and that's starting from a
cold cache.

90 seconds is a longer break in concentration than most developers will
accept to start a feature branch. Even so, they can grab a cup of coffee
while it's happening (multiple times per day). 20 minutes for "bzr
branch" is *completely* unacceptable IMO. (They only stop for lunch once
a day.)

I don't have recent figures but I believe git and hg are both in the
ballpark for cp -a (or better). We cannot be 10-20X slower and expect
users not to complain.

Ian C.

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, 2009-09-28 at 04:42 +0000, Ian Clatworthy wrote:
> Robert Collins wrote:
>
> > The question is 'is 20 minutes appropriate for cloning all the history
> > of a project the size of OOo' ? And I think it is.
>
> "time cp -a OOo-trunk OOo-fix" takes 1m24secs and that's starting from a
> cold cache.
>
> 90 seconds is a longer break in concentration than most developers will
> accept to start a feature branch. Even so, they can grab a cup of coffee
> while it's happening (multiple times per day). 20 minutes for "bzr
> branch" is *completely* unacceptable IMO. (They only stop for lunch once
> a day.)
>
> I don't have recent figures but I believe git and hg are both in the
> ballpark for cp -a (or better). We cannot be 10-20X slower and expect
> users not to complain.

If users want to copy without validation, I think they should copy. cp
-a is effectively ideal for this - why should we reimplement it!

The *primary* fix for this is 'do not copy history to make new
branches', and making that more accessible is very important: but its
still *not this bug*.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

Apparently this is wont fix rather than fix released

Changed in bzr:
status: Fix Released → Won't Fix
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Here are the 2.0.0 performance figures on an import of FireFox 3.5 on 2a format. The tree is large (40k files) and the repository is relatively shallow (19k revisions).

1. cp -a takes 59.7 seconds
2. bzr branch takes 4m43secs

By way of comparison:

3. 'hg clone --pull' (no hardlinks) takes 2m1sec
4. 'hg clone' (hardlinks by default) takes 52.4 secs.

So the apples-to-apples comparison (2 vs 3) implies bzr is still more than 100% slower than hg on this particular operation. That's clearly much better than the original 60X slower so I understand the desire to say this is now fixed. OTOH, it's likely users will simply re-open it as long as the gap is over 50%.

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

I don't want to recommend users use 'cp -a' because it doesn't lock the source. Under the right circumstances (source and target both local and both outside a shared repo), 'bzr branch' could always do exactly that under the covers after locking the 2 branches. I think that would meet the expectations of most users wrt what clone/branch does, though it doesn't met everyone's - see above.

Unless we make that optimisation (either implicitly or via an option), I can't see how bzr can get within 50% of hg clone --pull. If we do make that optimisation, we may even be quicker.

Revision history for this message
Martin Pool (mbp) wrote :

A full local copy is not what we'd generally be guiding user to do, but it's not an inherently unreasonable thing to do, eg when making a backup. It seems that this is slower than it should be, so the bug is still open. However it's probably not a high priority.

Martin Pool (mbp)
Changed in bzr:
status: Won't Fix → Confirmed
importance: High → Medium
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.