Per-file log is *very* slow with packs

Bug #172567 reported by Lukáš Lalinský
4
Affects Status Importance Assigned to Milestone
Bazaar
Expired
Undecided
Unassigned

Bug Description

Knits (best of 5 runs):

$ time bzr log bzrlib/win32utils.py >/dev/null

real 0m1.637s
user 0m1.520s
sys 0m0.068s

Packs (best of 5 runs):

$ time bzr log bzrlib/win32utils.py >/dev/null

real 0m13.334s
user 0m10.869s
sys 0m0.292s

IMO, this is quite important regression.

Tags: log packs
Revision history for this message
John A Meinel (jameinel) wrote :

Any chance you could attach a callgrind/lsprof run with the difference?

My guess is that reading the per-file graph is having to read the complete graph of all files. So it is just time spent loading the .tix indexes.

bzr --lsprof-file ../packs.callgrind log bzrlib/win32utils.py > /dev/null
bzr --lsprof-file ../knits.callgrind log bzrlib/win32utils.py > /dev/null

And then attach the files here.

Changed in bzr:
status: New → Incomplete
Revision history for this message
Lukáš Lalinský (luks) wrote :
Revision history for this message
Lukáš Lalinský (luks) wrote :
Revision history for this message
Lukáš Lalinský (luks) wrote :

It seems to try to do some bisection, as the most expensive call is _lookup_keys_via_location. But just this function itself is more expensive than the whole command with knits.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 172567] Per-file log is *very* slow with packs

 tag packs

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 172567] Re: Per-file log is *very* slow with packs

So there are several problems here.

One: iter_entries_prefix is being called; this is used when you do
iter_all_entries on a GraphPrefixAdapter. And Knit.versions(), or
Knit.__len__() both make this call.

So this line:
    weave_modifed_revisions = set(file_weave.versions())

is triggering a load of *all* file index history across the entire
repository.

That accounts for 10% of runtime up-front.

The other 67% big up front killer is Repository.get_revision_graph which
is triggering the partial index reads. You can see it make 33730 calls
to get_parents() doing this - and this is more of an issue than the
bisection call overheads; if we had topologically sorted indices we'd
see less fragmentation and less time in bisection.

So in short there are three obvious issues:
 - per file log is loading all text indices for the entire repository.
Aka. Knit.versions() is a size-of-history API.
 - get_revision_graph() is loading the desired history quite slowly;
perhaps the index layer should grab a native history-following call to
reduce this overhead
 - get_revision_graph() is called *twice* by the log code. 33K to 16K
calls into code that does IO is a good win.

I think we should file 3 bugs, all of these things are avoidable or
fixable.

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Bazaar because there has been no activity for 60 days.]

Changed in bzr:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.