Slow commits when there are many tags in the repository

Bug #200205 reported by Mattias Eriksson
4
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
Triaged
Low
Unassigned

Bug Description

I have been discussing this with Jelmer over mail, I'm just creating a bug so this will not be forgotten and overlooked.

> I saw you had made some caching fix in bzr-svn to make commits faster. I
> > was happy to see this since I try to get bzr in to my company that
> > currently works with svn. However, the repository has about 45000
> > revisions, 500000 lines of code and about 4500 files and on this
> > repository commit is still really slow (well I canceled after a while
> > and used svn since it just took to long). Is it a known issue that it is
> > very slow in these kind of repositories or could I help you with some
> > debuging/profiling?
> How exactly are you using bzr-svn? Are you running "bzr commit" in a
> Subversion checkout (-d .svn) or in a Bazaar checkout (-d .bzr)?
>
> In the first case, bzr-svn will have to do some analysis of the
> repository that can be cached, so it will be a one-time thing. This
> shouldn't take very long but it may suffer from the same problems as
> cloning the repository. It would be interesting to know which part of
> this process is taking very long in your case.
>
> The best way to do debugging is to run bzr with the arguments
> -Dtransport -Dfetch, which will cause bzr-svn to write debug output to
> ~/.bzr.log.
>
I'll attach the log... it is not of a full commit since I didn't wait for it to complete. The first was aborted quite fast, for the second one I waited one minute after I had written the commit message.

I see that it is in the last line before I abort is
25.522 svn ls -r 1 ''tags''

and if I do:
svn ls svn://oof/ardome/tags| wc -l
3500

yes I know it is kind of weird to have 3500 tags in a repository... but the way we do it we will have multiple tags per snapshot of our product and we do snapshot of every minor delivery to be able to track things...
Can't this be avoided in any way? I commit to trunk why is tags and branches checked?

Revision history for this message
Mattias Eriksson (snaggen) wrote :
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 200205] [NEW] Slow commits when there are many tags in the repository

  status triaged
  importance low
--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Changed in bzr-svn:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Mattias Eriksson (snaggen) wrote :

Ok, just to be clear... I do not have any deep knowledge about bzr/svn/bzr-svn internals so I may be just talking rubbish here...

Anyway... I was digging in to the bzr-svn code and find that the problem is inside the
repository.py: def find_branchpaths(self, scheme, from_revnum=0, to_revnum=None)
function. Where it according to the comment tries to find all branch paths that were changed in the specified revision range. As I said don't know enough subversion/python/bzr internals to understand the exact semantics of the code, but as I understand it from the logs it does an ls of branches and tags ant then tries to process each of these...

But if we just want to find the branches that were changed in a specific revision range think that it might be faster to use svn log on the repository, if I have a repo with this layout:
project/trunk
project/branches/A
project/branches/B
project/tags/A
project/tags/B

Then from the commandline does a svn log -v svn://host/project/ I get a listing that includes all the files that have changed, so my not so qualified guess there is that using that might scale better in this case.... but unfortunatly I don't know the code well enough to test this.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

It already uses "svn log" and only uses "svn ls" in a couple of corner cases only (when copies from older parts of history are encountered).

self.transport.get_dir() is the bit which causes "svn ls" to be run

Revision history for this message
Mattias Eriksson (snaggen) wrote :

Yes I saw that it was get_dir doing the actual listing... but I assue that is to far down the stack to avoid the tags listing.

Trying to see what is happening here...
So we got the list of files that have actually changed, so I guess we know all the relevant tags and branches to process from this list... so is there a need to do a svn ls of the branches and tags directory. Couldn't that list of branches/tags be used if any further processing of the branches/tags must be done for the corner cases?

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I'm afraid this is a tradeoff issue. If you have a very long history, using the "svn log" output and tracking all the copies there can be very, very expensive and complex to get right. "svn ls" otoh lets the server take care of figuring out what dirs/files are in a directory, which is always correct and easier to deal with, code-wise (since we let the server handle the complexity).

We could add code into bzr-svn that would only use "svn ls" only if the history is very large and look at history ourselves in all other cases. However, other than your use case, there are not a lot of reasons to do so at the moment.

Revision history for this message
Mattias Eriksson (snaggen) wrote :

I see, and I realize that my case with that many tags is not the most common case.

Anyway, for the case I did I do not see why this is done... that shoudn't be a corner case. I was working in trunk and my workingtree was up to date and I has a bound branch.... I can see that if there are merging/branching going on that there might be a need to figure out what has been done, but for a bound branch that is up to date I do not undestand why we are doing this in the first place. And in your comment above you write that it is only used when copies are encountered, but since my branch is up to date I don't see how this corner case is triggered.

I realize that I'm probably missing something very obvious here, but I hope you will have some patience with me... just trying to understand what is going on.

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 200205] Re: Slow commits when there are many tags in the repository

It happens not just when there are copies in your current trunk. It
happens if there are copies anywhere in history for tags/branches.

--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.