Log performance with stacked branches

Bug #309374 reported by David Reitter
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned

Bug Description

Log operations are unacceptably slow with stacked branches.

I have created a branch stacked on the Emacs bzr repository (I have since upgraded locally to 1.10).
Here are some examples of "log" operations on a single file that has just 5 revisions.

~/Projects/Bazaar-Playground/emacs-stacked/msdos$ time bzr log sigaction.c
...
real 1m20.618s
user 0m24.429s
sys 0m2.307s

~/Projects/Bazaar-Playground/emacs-stacked/msdos$ time bzr log --short sigaction.c
....
real 0m34.595s
user 0m15.712s
sys 0m1.036s

Similar operations are much faster with a non-stacked branch.

For comparison:

time cvs log sigaction.c
...
real 0m0.735s
user 0m0.026s
sys 0m0.013s

(Note that this has to query a server, too, and has twice as many revisions for some reason.)

I would probably say that anything <5 seconds for a "log" command would make the command usable.

Btw, thanks for all your work!

Revision history for this message
David Reitter (david-reitter) wrote :

the stacked branch was created with

bzr branch --stacked http://bzr.notengoamigos.org/emacs-ce/trunk/

The performance for "log" with an unstacked branch:

 time bzr log sigaction.c >/dev/null

real 0m25.898s
user 0m24.900s
sys 0m0.595s

time bzr log --short sigaction.c >/dev/null

real 0m15.442s
user 0m15.020s
sys 0m0.400s

Revision history for this message
David Reitter (david-reitter) wrote :

Same problem in 1.13. In the example below (unlike in what I show above), the file in question has only been created in my branch; the branch that it is stacked on shouldn't know anything about this file.

revno: 96979
committer: David Reitter <email address hidden>
branch nick: aquamacs-emacs-stacked
timestamp: Mon 2009-03-23 01:59:14 +0000
message:
  adding Aquamacs-related files from CVS

real 1m48.384s
user 1m21.483s
sys 0m0.848s
BRAEBURN:aquamacs-emacs-stacked dr$ cd ../aquamacs-emacs
BRAEBURN:aquamacs-emacs dr$ time bzr log finish-aquamacs
------------------------------------------------------------
revno: 97005
committer: David Reitter <email address hidden>
branch nick: a-e-stack
timestamp: Sun 2009-03-22 22:25:59 +0000
message:
  build scripts

real 0m24.194s
user 0m22.855s
sys 0m0.576s

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 309374] Re: Log performance with stacked branches

On Mon, 2009-03-23 at 02:50 +0000, David Reitter wrote:
> Same problem in 1.13. In the example below (unlike in what I show
> above), the file in question has only been created in my branch; the
> branch that it is stacked on shouldn't know anything about this file.

We have a major overhaul of disk storage in progress at the moment in
the brisbane-core project; this is expected to have a beta quality
format available in bzr 1.14; performance issues should be tested on
that format (and we are testing with copies of openoffice, emacs, and
mysql) - so I fully expect we'll get some motion on this.

However, right now nearly all our efforts are focused on that, so its
likely that we won't get down to examining the log performance issue
you're seeing here in detail until the new format is in beta.

If you have the time, one thing you could do to help (because there may
be a simple fix we can do) would be to get an lsprof trace:

bzr --lsprof-file foo.callgrind <your operation here>, and attach the
resulting callgrind file - it will tell us what is taking the excess
time in your operation.

Thanks,
Rob

Revision history for this message
David Reitter (david-reitter) wrote :

On Mar 22, 2009, at 11:19 PM, Robert Collins wrote:
> If you have the time, one thing you could do to help (because there
> may
> be a simple fix we can do) would be to get an lsprof trace:

No problem. It's attached.

Revision history for this message
Robert Collins (lifeless) wrote :

btw what parameters did you give log ?

broadly (these don't sum, they are snapshots out of the call graph).
1314 btree nodes read in 212 calls. 24%
however 4.7M nodes examined - 29% of time.
300000 get_parent_map calls.
get_parent_map in iter_ancestry:97K get_parent_map calls 78% of time
filter_fileid_revisions_touching_file: 11%

so in knit.py's get_parent_map there is a function turning 97K external
calls into 190K backing calls. Thats actually reasonable (because its
stacked, and 99% of the history will be in the stacked-on branch
typically.

The most obvious thing: we shouldn't examine all ancestry for this
command (as its clearly per-file).

Secondly, we may be thrashing the B+Tree cache.

I wonder if bzr-revnocache would help.

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

I would imagine the support for "bzr log REMOTE_BRANCH/file" is not where it could be, since it will invoke VFS operations rather than smart requests.

Changed in bzr:
importance: Undecided → Low
status: New → Confirmed
tags: added: log performance
Jelmer Vernooij (jelmer)
tags: added: stacking
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.