slower log in 1.6 than 1.5

Bug #257180 reported by Martin Pool on 2008-08-12
4
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned
1.6
High
Unassigned

Bug Description

log, especially on a single file, is slower in 1.6 than in 1.5 and 1.3

More details and an analysis of the problem are here: <https://lists.ubuntu.com/archives/bazaar/2008q3/045736.html>

Martin Pool (mbp) on 2008-08-12
Changed in bzr:
importance: Undecided → High
milestone: none → 1.6
status: New → Confirmed

Thankyou for investigating this more. I filed
<https://bugs.launchpad.net/bzr/+bug/257180>

It would be a shame to have this regression in 1.6. It seems like one
practical measure would be to restore the load of the whole index in
this case, and at least see how that compares. This would be a step
back from the kind of index behaviour we generally want, but perhaps
the best we can do for current pack indexes. Perhaps we could do this
through an api that is just a hint, and less specific than
_buffer_all, so that other repositories can just ignore it.

--
Martin <http://launchpad.net/~mbp/>

On Wed, 2008-08-13 at 01:07 +0000, Martin Pool wrote:
>
> It would be a shame to have this regression in 1.6. It seems like one
> practical measure would be to restore the load of the whole index in
> this case, and at least see how that compares. This would be a step
> back from the kind of index behaviour we generally want, but perhaps
> the best we can do for current pack indexes. Perhaps we could do this
> through an api that is just a hint, and less specific than
> _buffer_all, so that other repositories can just ignore it.

I have repositories where doing this will OOM bzr :) I realise that they
are somewhat extreme :(. I agree its not a desirable regression, but it
simply highlights the known deficiencies of 'graphindex' more.

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

John A Meinel (jameinel) wrote :

Are you sure they wouldn't OOM with the existing code? Considering it does:

all_file_ids = [(file_id, revision_id) for revision_id in ancestry]
existing_file_ids = repository.get_parent_map(all_file_ids)

The problem is that for the majority of cases where the (file_id, revision_id) key will not be present (it never existed), we have to bisect search for it in every pack file that we have.

In the case of something with real ancestry, this is generally 95% of the keys are invalid, and probably require bisecting the majority of possible pages. (perhaps not depending on the locality of file_id).

As you have already converted that repo over to btree, I'm not sure it is a strictly valid datapoint.

John A Meinel (jameinel) wrote :

A partial fixed was merged into 1.6, which causes the index to be buffered if we request more than 5% of its keys.

There are still other bits that show log being a bit slower, so I'm not quite ready to mark this "Fix Released". But the bulk of the regression has been fixed. Perhaps this should be Fix Released and a new bug opened?

Martin Pool (mbp) wrote :

> Perhaps this should be Fix Released and a new bug opened?

I think so. Please open another bug if that will help keep track of what else needs to be done.

Martin Pool (mbp) on 2008-09-01
Changed in bzr:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers