bzr status after merge slower with packs

Bug #172657 reported by John A Meinel
12
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
John A Meinel

Bug Description

After doing a 'bzr merge' when running 'bzr status' it takes longer than
expected to display the revisions that are now merged.

It very quickly displays the list of changed files, and gets to:
pending merges:

And then it sits for a while while it extracts the revisions it needs.

Currently, to determine the merge ancestry (what revisions in the merged parent
are new and not already present in the target) it extracts the full revision
history, and then does (effectively) a set difference.

I tracked it down into knit_repo.get_ancestry() calls, which is called 2 times
(one for each head). Out of 18s for the complete command 14.3s is spent in
_lookup_keys_via_location, and 17.5 is spent it KnitRepository.get_ancestry().

We probably could use the Graph LCA code to find nodes which are present in one
head that are not in the other one, without having to search the entire history.

I'll post the full callgrind in a second, but the quick lsprof summary is:

    26703 14.297 6.482 bzrlib.index:563(_lookup_keys_via_location)
  +698272 5.921 2.081 +bzrlib.index:422(_parsed_key_index)
   +31642 0.272 0.259 +bzrlib.index:380(_resolve_references)
  +382215 0.166 0.166 +<method 'append' of 'list' objects>
  +349089 0.122 0.122 +<len>
   +53406 1.326 0.038 +bzrlib.index:945(_read_and_parse)
     +374 0.003 0.001 +bzrlib.index:409(_parsed_byte_index)
     +521 0.000 0.000 +<method 'update' of 'set' objects>

There are some other issues here, though.

We are looking up approximately 15k revisions, (though we are doing it 2
times). Presumably we get 26.7k calls because all of those merge revisions get
to look up both parents at the same time.

I don't quite understand how that translates into 700k calls to _parsed_key_index.

I think we should possibly re-evaluate how the bisection is working.
Considering that reading from the disk is only costing 1.3s, it seems that we
have a sub-optimal memory layout. (It is costing 5.9s to bisect through the
memory structure for every key we are looking for.)

I'm not sure if you looked at my bisect_multi code for dirstate, which did use
a sorted request so that it could quickly partition the requested keys. It
might be worthwhile to do that with the memory structure.

Revision history for this message
John A Meinel (jameinel) wrote :
Changed in bzr:
status: New → Triaged
John A Meinel (jameinel)
Changed in bzr:
importance: Undecided → Medium
Revision history for this message
John A Meinel (jameinel) wrote :

In the associated branch, we use Graph.find_difference() rather than grabbing the full ancestry of all revisions.

Changed in bzr:
assignee: nobody → jameinel
milestone: none → 1.0alpha2
status: Triaged → Fix Committed
Revision history for this message
Aaron Bentley (abentley) wrote :

Note that Graph.find_difference can be fooled by shortcuts, so it is not advisable for general use.

Revision history for this message
John A Meinel (jameinel) wrote :

Can you give an example?

I would be happy to spend some time updating find_difference, since it would be a very useful function to have.

Revision history for this message
Robert Collins (lifeless) wrote :

Should this be Fix released now?

Revision history for this message
Stefan Monnier (monnier) wrote : Re: [Bug 172657] Re: bzr status after merge slower with packs

> Should this be Fix released now?

Kind of. In 99% of the cases, the behavior with bzr-1.5 is OK
(i.e. listing the pending merges does not noticeably slowdown the whole
"bzr status" command, tho it commonly takes about 50% of the time.
But in the case of a large and/or complex merge, it can still easily
dominate the total time).

        Stefan "Still wondering why the pending merges are included
                in `bzr status' rather than having a separate command".

> --
> bzr status after merge slower with packs
> https://bugs.launchpad.net/bugs/172657
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.

> Status in Bazaar Version Control System: Fix Committed

> Bug description:
> After doing a 'bzr merge' when running 'bzr status' it takes longer than
> expected to display the revisions that are now merged.

> It very quickly displays the list of changed files, and gets to:
> pending merges:

> And then it sits for a while while it extracts the revisions it needs.

> Currently, to determine the merge ancestry (what revisions in the merged parent
> are new and not already present in the target) it extracts the full revision
> history, and then does (effectively) a set difference.

> I tracked it down into knit_repo.get_ancestry() calls, which is called 2 times
> (one for each head). Out of 18s for the complete command 14.3s is spent in
> _lookup_keys_via_location, and 17.5 is spent it KnitRepository.get_ancestry().

> We probably could use the Graph LCA code to find nodes which are present in one
> head that are not in the other one, without having to search the entire history.

> I'll post the full callgrind in a second, but the quick lsprof summary is:

> 26703 14.297 6.482 bzrlib.index:563(_lookup_keys_via_location)
> +698272 5.921 2.081 +bzrlib.index:422(_parsed_key_index)
> +31642 0.272 0.259 +bzrlib.index:380(_resolve_references)
> +382215 0.166 0.166 +<method 'append' of 'list' objects>
> +349089 0.122 0.122 +<len>
> +53406 1.326 0.038 +bzrlib.index:945(_read_and_parse)
> +374 0.003 0.001 +bzrlib.index:409(_parsed_byte_index)
> +521 0.000 0.000 +<method 'update' of 'set' objects>

> There are some other issues here, though.

> We are looking up approximately 15k revisions, (though we are doing it 2
> times). Presumably we get 26.7k calls because all of those merge revisions get
> to look up both parents at the same time.

> I don't quite understand how that translates into 700k calls to _parsed_key_index.

> I think we should possibly re-evaluate how the bisection is working.
> Considering that reading from the disk is only costing 1.3s, it seems that we
> have a sub-optimal memory layout. (It is costing 5.9s to bisect through the
> memory structure for every key we are looking for.)

> I'm not sure if you looked at my bisect_multi code for dirstate, which did use
> a sorted request so that it could quickly partition the requested keys. It
> might be worthwhile to do that with the memory structure.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm not sure about the milestone for this, for some reason I wasn't able to find the comment in NEWS, I don't think it was 1.0rc2, though.

In general, it is a lot better than it used to be. So I'm marking this Fix Released. The Graph.find_unique_ancestors() does walk the graph more than it probably has to, and suffers from some scaling issues. The nice thing about the old code, is it only traversed the graph 2x. It just traversed the *entire* graph 2x. The new one has the possibility of traversing the same nodes repeatedly.

I think it would be great to revisit it, but I think it is "good enough" for now. (I would *like* to do a node-coloring version. In my initial testing, the overhead of creating an object for each node overwhelmed the other advantages. Perhaps a pyrex version could create lighter-weight objects.

Changed in bzr:
status: Fix Committed → Fix Released
Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2008-07-18 at 13:57 +0000, Stefan Monnier wrote:
>
> Stefan "Still wondering why the pending merges are included
> in `bzr status' rather than having a separate
> command".

The design goal of status is 'show the user all the effects of running
commit'. We can definitely have an additional command for just pending
merges.

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
| On Fri, 2008-07-18 at 13:57 +0000, Stefan Monnier wrote:
|> Stefan "Still wondering why the pending merges are included
|> in `bzr status' rather than having a separate
|> command".
|
| The design goal of status is 'show the user all the effects of running
| commit'. We can definitely have an additional command for just pending
| merges.
|
| -Rob
|

Well, if we just showed the direct pending merge, it could be very fast.
Showing the extra revisions is the expensive portion. :)

And often, they are a bit of noise when you have 100+ of them.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiDSZAACgkQJdeBCYSNAANzUQCff04GCiVH6N8dwpWShko0ANhT
vrcAnA1icnF0QqPs/u9W8sGpHmOUaIjQ
=NARo
-----END PGP SIGNATURE-----

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

John A Meinel wrote:
> Robert Collins wrote:
> | On Fri, 2008-07-18 at 13:57 +0000, Stefan Monnier wrote:
> |> Stefan "Still wondering why the pending merges are included
> |> in `bzr status' rather than having a separate
> |> command".
> |
> | The design goal of status is 'show the user all the effects of running
> | commit'. We can definitely have an additional command for just pending
> | merges.
> |
> | -Rob
> |
>
> Well, if we just showed the direct pending merge, it could be very fast.
> Showing the extra revisions is the expensive portion. :)
>
> And often, they are a bit of noise when you have 100+ of them.

I strongly agree. I'd like the extra information to just be shown in
verbose mode. So, this is how I'd like status to work ...

  $ bzr status
  modified:
    foo
    bar
  pending merge tips:
    Who When What

  $ bzr status -v
  modified:
    foo
    bar
  pending merges (* = tip):
  * Who When What
    Who When What
    Who When What

That would be faster by default and make the output after merging from trunk
much more useful (to me). It would also clear up the confusion about why
status does the indenting it currently does.

If no-one disagrees, I'm happy to throw together a patch for this.

Ian C.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.