Bazaar

Bug #405317
Comment #20

Comment 20 for bug 405317

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-01: Re: [Bug 405317] Re: Huge Performance regressions in bazaar 1.16 and higher

#20

Thanks for the additional callgrind - very interesting.

I've filed a bug on filter support - filters are causing an early read
and may provoke memory bloat; I don't think that that is your specific
problem, but its worth knowing about so that we can fix it.

30% of the commit time was reading files from disk
60% was doing heads calculations

We read 8521 files from disk, about 400 more than have changed. This may
be a bug in filters causing status to think more things were changed
than were. Or it may be a result of the merge that is taking place. Were
you using filters before you upgraded?

We do 8521 heads heads calculations, which precisely matches the file
count read from disk. This makes me lean towards suspecting the merge
code path.

heads() is fast paths to return immediately if there is only one head.
So bzr thinks that in the merge you are doing:
Working:[parentA, parentB]
there are 8521 files which have different last-changed in parent A and
parent B.

Tell me, do you have a pattern where you repeatedly merge from some
branch B, but never merge back into that branch? Or vice verca -
repeatedly merge into B but never merge from it.

If so, this will slowly accumulate a larger and larger set of differing
last-changed fields, and bzr will be repeatedly checking to see if the
other branch had in fact changed.

The rest of this mail is assuming that you are repeatedly merging from a
branch which is never merged into (or merging into a branch that isn't
merged from).

I wonder/suspect that this is happening as well in the other commit code
path, but I don't know what we changed there to provoke it. Perhaps just
an efficiency bug in the btree support code. (which is where 50% of the
commit time goes).

A few things we can do:
- stop reading from the local disk when a file not changed from
basis->working tree if we're only examining it due to it showing up in a
basis->merge_parent delta. That will save 25% of the commit time for
you.
- We could do a merge-basis calculation and discard last-changed fields
  from the source which are not in the set of revisions being merged.
  I'm not sure this would always give correct results - I'm speculating
  about doing it.
- make heads faster; consider using the graph preprocessing heads
   tool we have now, or tweaking the graph layer more.
- be more selective about what components of the basis->merge parent
   delta we include *in the case that the content hasn't changed*. Again
   this could lead to incorrect file graphs (failing to converge on
   flip-flop changes on one side), so we'll need to be careful about
   whether, or how, we do this.
- ???
- profit.

So, its my weekend now, but I'll spin off a bug for this on Monday
trying to break it down into components, get some info together and see
about a hot fix for you.

-Rob

Thanks for the additional callgrind - very interesting.

30% of the commit time was reading files from disk
60% was doing heads calculations

We do 8521 heads heads calculations, which precisely matches the file
count read from disk. This makes me lean towards suspecting the merge
code path.

Tell me, do you have a pattern where you repeatedly merge from some
branch B, but never merge back into that branch? Or vice verca -
repeatedly merge into B but never merge from it.

If so, this will slowly accumulate a larger and larger set of differing
last-changed fields, and bzr will be repeatedly checking to see if the
other branch had in fact changed.

The rest of this mail is assuming that you are repeatedly merging from a
branch which is never merged into (or merging into a branch that isn't
merged from).

A few things we can do:
 - stop reading from the local disk when a file not changed from
basis->working tree if we're only examining it due to it showing up in a
basis->merge_parent delta. That will save 25% of the commit time for
you.
 - We could do a merge-basis calculation and discard last-changed fields
  from the source which are not in the set of revisions being merged. 
  I'm not sure this would always give correct results - I'm speculating
  about doing it. 
 - make heads faster; consider using the graph preprocessing heads
   tool we have now, or tweaking the graph layer more.
 - be more selective about what components of the basis->merge parent
   delta we include *in the case that the content hasn't changed*. Again
   this could lead to incorrect file graphs (failing to converge on
   flip-flop changes on one side), so we'll need to be careful about
   whether, or how, we do this.
 - ???
 - profit.

So, its my weekend now, but I'll spin off a bug for this on Monday
trying to break it down into components, get some info together and see
about a hot fix for you.

-Rob