bzr operations in svn working copy are much slower than svn

Bug #318993 reported by Wesley J. Landaker
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
Fix Released
Medium
Jelmer Vernooij

Bug Description

When using a bzr-svn in an svn working copy, bzr operations (like status, diff, info, etc, etc) seem to do a bunch of work, e.g. connect the server, check repository layout, etc, whereas the same svn operations are instantaneous. This means that it's at least *possible* for bzr-svn to do some of these operations much faster, but I don't know if it's actually feasible.

The main issue is that this makes using bzr-svn in a lightweight checkout rather slow compared to svn, which is unfortunate. For example, I occasionally *must* use svn to check things out (because I need to use svn --depth magic to get disparate pieces of multi-gigabyte trees), but I'd rather actually use bzr to actually work on things.

This is of course a wishlist item, but there are some things that might be easy to fix (e.g. why does "bzr info" need to analyze the repository layout and determine changes?)

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 318993] [NEW] bzr operations in svn working copy are much slower than svn

Hi Wesley,

On Mon, 2009-01-19 at 22:29 +0000, Wesley J. Landaker wrote:
> When using a bzr-svn in an svn working copy, bzr operations (like
> status, diff, info, etc, etc) seem to do a bunch of work, e.g. connect
> the server, check repository layout, etc, whereas the same svn
> operations are instantaneous. This means that it's at least *possible*
> for bzr-svn to do some of these operations much faster, but I don't know
> if it's actually feasible.
Can you mention the explicit commands you think could be faster ? I
think there are several bugs here, and it would be nice to file them
separately.

It's very hard to get bzr-svn in svn working copies as fast as svn
because of differences between the data models of the two tools. E.g.
bzr requires file ids for all files, and generating those requires
analysing history. Subversion has no concept of branches and Bazaar
does, so bzr-svn has to determine what the branches are.

> The main issue is that this makes using bzr-svn in a lightweight
> checkout rather slow compared to svn, which is unfortunate. For example,
> I occasionally *must* use svn to check things out (because I need to use
> svn --depth magic to get disparate pieces of multi-gigabyte trees), but
> I'd rather actually use bzr to actually work on things.
This particular case is primarily a bzr bug and should be filed
separately - Bazaar always works on complete trees, never partial trees
(such as --depth 1) at the moment.

> This is of course a wishlist item, but there are some things that might
> be easy to fix (e.g. why does "bzr info" need to analyze the repository
> layout and determine changes?)
It needs to count the number of *bzr* revisions in the repository, and
that's not the same as the number of svn revisions. Finding the bzr
revisions requires analysing the repository and determining the changes,
so I don't think "bzr info" can be improved.

--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Revision history for this message
Wesley J. Landaker (wjl) wrote :

Sorry to lump multiple issues together.

I filed the --depth wishlist issue as #319256.

If there is something else that should be a separate bug that you see, please point it out, but I think the other issues mostly boil down to doing extra work when it (seemingly) is unnecessary. As I said, it may actually be necessary in practice because of the bzr-svn architecture.

bzr info is a good example:

If I do bzr info -v, I can see how it needs to do all the checking, because that gives me info about the format, working tree, branch history, number of revisions, etc. However, if I just do bzr info without -v, all I get is three lines telling me that it is a checkout and pointing at the branch URL, so it seems like this shouldn't need to do all the extra enumeration since it's not going to print any of it out.

I think bzr status, diff, added, etc, etc, are all similar: they just work on the working tree, so shouldn't need any large amount of history, but they print out a progress bar saying that it's analyzing repository layout and then determining changes for every revision. For example, this seems unnecessary for status and diff (without -r arguments or anything funny like that), since they are only concerned with differences between the working tree and the last revision. Mostly, it's a matter of I'm expecting to do a O(1) operation and instead it's O(revisions).

To give a real example, for a large SVN repository with many thousands of revisions, to just do a status of a single directory takes about 1 second with svn status and almost 10 seconds with bzr status. Almost all of the time is spent with the progress bar going through "analyzing repository layout" and then "determining changes ###/####" for every revision. It's better in repositories with less revisions, and worse in repositories where there are more.

This isn't a horrible slowdown per command, but it adds up in everyday use and makes using bzr in this situation feel really sluggish (I usually just end up reverting to using svn in these cases).

P.S. I'm happy to now be reporting mostly wishlist bugs, since that means I'm having a hard time breaking bzr-svn. ;)

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 318993] Re: bzr operations in svn working copy are much slower than svn

On Tue, 2009-01-20 at 15:22 +0000, Wesley J. Landaker wrote:
> Sorry to lump multiple issues together.
>
> I filed the --depth wishlist issue as #319256.
Thanks!

> If there is something else that should be a separate bug that you see,
> please point it out, but I think the other issues mostly boil down to
> doing extra work when it (seemingly) is unnecessary. As I said, it may
> actually be necessary in practice because of the bzr-svn architecture.
>
> bzr info is a good example:
>
> If I do bzr info -v, I can see how it needs to do all the checking,
> because that gives me info about the format, working tree, branch
> history, number of revisions, etc. However, if I just do bzr info
> without -v, all I get is three lines telling me that it is a checkout
> and pointing at the branch URL, so it seems like this shouldn't need to
> do all the extra enumeration since it's not going to print any of it
> out.
It can skip the "fetching revision info" step if you disable bzr-svn's
internal caches (use-cache=False in ~/.bazaar/subversion.conf).

It still has to analyse *some* revisions to guess the repository layout,
even for "bzr info", to determine whether the path you have specified is
a branch or not.

> I think bzr status, diff, added, etc, etc, are all similar: they just
> work on the working tree, so shouldn't need any large amount of history,
> but they print out a progress bar saying that it's analyzing repository
> layout and then determining changes for every revision. For example,
> this seems unnecessary for status and diff (without -r arguments or
> anything funny like that), since they are only concerned with
> differences between the working tree and the last revision. Mostly, it's
> a matter of I'm expecting to do a O(1) operation and instead it's
> O(revisions).
As I mentioned earlier, bzr's API (including bzr's working tree
operations) are based around file ids. In order for bzr-svn to come up
with these file ids (that svn doesn't have), it needs to generate them
by analysing history. The only way to fix this would be to significantly
change how Bazaar worked internally, and I don't think this is very
likely.

> This isn't a horrible slowdown per command, but it adds up in everyday
> use and makes using bzr in this situation feel really sluggish (I
> usually just end up reverting to using svn in these cases).
using bzr formats rather than svn formats locally should be a lot
faster.

The file ids should only be generated once, after that they should be
retrieved from the cache. If you're seeing the analysing history bit
more than once for a single branch, that is definitely a bug.

> P.S. I'm happy to now be reporting mostly wishlist bugs, since that
> means I'm having a hard time breaking bzr-svn. ;)
:-) That's good to hear.
--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Revision history for this message
Wesley J. Landaker (wjl) wrote :

After giving this some thought, I think the real issue -- for me, anyway -- is that I'm often tied to using svn working copies instead of bzr-svn branches because of the lack of --depth support. Otherwise, I would just not use svn working copies and this bug would be moot.

As far as analyzing history more than once per branch, it's definitely not doing the long and drawn out analysis process that first happens when it initializes the cache, but it is doing something (I think it's the "determining changes" step, but it's hard to tell with the new progress bar in bzr.dev that leaves text artifacts behind) for every revision. Depending on how many revisions are in the repository, it takes anywhere from 2 to 10 seconds, which is not THAT bad ... I just am surprised to see any kind of O(revisions) for something like a plain "bzr status" or "bzr diff".

Anyway, since I split --depth out into a separate bug and the rest of my report is perhaps expected behavior given the bzr-svn architecture, feel free to close this as incomplete/invalid if you don't see anything that sounds abnormal. I can always report with more specifics if I find out something more specific.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Wed, 2009-01-21 at 01:55 +0000, Wesley J. Landaker wrote:
> As far as analyzing history more than once per branch, it's definitely
> not doing the long and drawn out analysis process that first happens
> when it initializes the cache, but it is doing something (I think it's
> the "determining changes" step, but it's hard to tell with the new
> progress bar in bzr.dev that leaves text artifacts behind) for every
> revision. Depending on how many revisions are in the repository, it
> takes anywhere from 2 to 10 seconds, which is not THAT bad ... I just am
> surprised to see any kind of O(revisions) for something like a plain
> "bzr status" or "bzr diff".
Hmm, that's odd - it should only be doing that when there are new
revisions in the repository. Any chance you can interrupt it (Ctrl+\)
and see why it's doing that ?

Cheers,

Jelmer
--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Revision history for this message
Wesley J. Landaker (wjl) wrote :
Download full text (4.1 KiB)

Okay, I made a repository with various amount of commits and timed the average of "bzr status" (not counting the first run). Here are the results so far:

~100 revisions: ~1.0 seconds
~300 revisions: ~1.5 seconds
~1000 revisions: ~3 seconds
~3500 revisions: ~6 seconds

(I can keep going, but it takes a while to generate SVN repositories with more revisions.)

I've attached a test repository, generated by looping the "make_a_commit" script that's included in the repository itself. You can add more revisions and see it get worse and worse (e.g. "svn revert -R .; while true; do ./make_a_commit; done" then wait for a while, ^C^C^C when you get a few more thousand revisions.)

If you unpack this and go to the wc subdirectory and run "bzr status", or "bzr diff", or whatever, you should be able to see it doing what I've described. You should be able to clearly see that every time you run it, it is doing _something_ over every revision that takes several seconds:

Anyway, this should give you an easy way to see and/or debug this behavior for yourself, but just to answer your question about doing ^\ and seeing where it is, here is a pdb bt after randomly doing a ^\ in the middle of whatever it is doing with all the revisions:

$ bzr status
^\** SIGQUIT received, entering debuggeres:analyzing repository layout:determining changes 877/3864
** Type 'c' to continue or 'q' to stop the process
** Or SIGQUIT again to quit (and possibly dump core)
> /home/wjlanda/lib/python/bzrlib/breakin.py(33)_debug()
-> signal.signal(signal.SIGQUIT, _debug)
(Pdb) bt
  /home/wjlanda/bin/bzr(130)<module>()
-> exit_val = bzrlib.commands.main(sys.argv)
  /home/wjlanda/lib/python/bzrlib/commands.py(884)main()
-> ret = run_bzr_catch_errors(argv)
  /home/wjlanda/lib/python/bzrlib/commands.py(893)run_bzr_catch_errors()
-> return run_bzr(argv)
  /home/wjlanda/lib/python/bzrlib/commands.py(839)run_bzr()
-> ret = run(*run_argv)
  /usr/lib/python2.5/site-packages/bzrlib/plugins/loom/commands.py(172)run_argv_aliases()
-> self._original_command().run_argv_aliases(argv, alias_argv)
  /home/wjlanda/lib/python/bzrlib/commands.py(539)run_argv_aliases()
-> return self.run(**all_cmd_args)
  /home/wjlanda/lib/python/bzrlib/commands.py(853)ignore_pipe()
-> result = func(*args, **kwargs)
  /home/wjlanda/lib/python/bzrlib/builtins.py(215)run()
-> tree, relfile_list = tree_files(file_list)
  /home/wjlanda/lib/python/bzrlib/builtins.py(64)tree_files()
-> return internal_tree_files(file_list, default_branch, canonicalize)
  /home/wjlanda/lib/python/bzrlib/builtins.py(105)internal_tree_files()
-> return WorkingTree.open_containing(default_branch)[0], file_list
  /home/wjlanda/lib/python/bzrlib/workingtree.py(333)open_containing()
-> return control.open_workingtree(), relpath
  /home/wjlanda/.bazaar/plugins/svn/workingtree.py(830)open_workingtree()
-> return SvnWorkingTree(self, self.local_path, self.open_branch())
  /home/wjlanda/.bazaar/plugins/svn/workingtree.py(114)__init__()
-> self._update_base_revnum(max_rev)
  /home/wjlanda/.bazaar/plugins/svn/workingtree.py(549)_update_base_revnum()
-> self.base_revid = self.branch.generate_revision_id(fetched)
  /home/wjlanda/.bazaar/plugins/svn/branch.py(25...

Read more...

Revision history for this message
Wesley J. Landaker (wjl) wrote :

Just another data point: I'm up to about 8000 revisions, and now bzr status takes about ~7.5 s.

I'm going to let this keep going and see how bad it gets. ;)

Revision history for this message
Wesley J. Landaker (wjl) wrote :

Up to about 10000 revisions and it takes about 10 seconds, so seems like the average is very roughly 1000 revisions add one second on my machine. I'm going to let this run overnight and get a several-hundred-thousand revision svn repos and see how that behaves.

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 318993] [NEW] bzr operations in svn working copy are much slower than svn

Thanks for the analysis. I'll just consider this bug to be about
improvements in the caching in bzr-svn then :-)

  status triaged
  importance medium
--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Changed in bzr-svn:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Wesley J. Landaker (wjl) wrote :

Okay, just one last result, since I ran this overnight (which gave me less revisions than I expected, but still a lot). I've also attached this last repository since it might be helpful in testing and might take a long time to generate yourself.

This also makes it easy to see that there are two separate passes through the revisions. Looks like it is saying "determining changes" (counts all the revisions, but only takes ~ 1 second) "analyzing repository layout" (< 1 second) "determining changes" (the rest of the time, the O(revisions) factor). Since the first one just takes about 1 seconds even for 73000 revisions, it isn't the problem. The second is where the rest of the time goes.

Anyway, I get:

~73000 revisions: ~73 seconds

Which is just what I'd expect given this behavior.

Anyway, I'm done analyzing for now unless you want me to try something else specific.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I need to fix the caching here I think, there are some things which should be fixed but aren't.

Changed in bzr-svn:
assignee: nobody → jelmer
milestone: none → 0.5.1
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

With Wesley's 73k test set, this now takes a couple of minutes initially and after that, most operations (not involving history) are pretty quick (less than 3 seconds).

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.