Bazaar

Per-file log is very slow with packs

Bug #172567 reported by Lukáš Lalinský on 2007-11-28

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Expired	Undecided	Unassigned

Bug Description

Knits (best of 5 runs):

$ time bzr log bzrlib/win32utils.py >/dev/null

real 0m1.637s
user 0m1.520s
sys 0m0.068s

Packs (best of 5 runs):

$ time bzr log bzrlib/win32utils.py >/dev/null

real 0m13.334s
user 0m10.869s
sys 0m0.292s

IMO, this is quite important regression.

Tags:

Revision history for this message

John A Meinel (jameinel) wrote on 2007-11-28:

Any chance you could attach a callgrind/lsprof run with the difference?

My guess is that reading the per-file graph is having to read the complete graph of all files. So it is just time spent loading the .tix indexes.

bzr --lsprof-file ../packs.callgrind log bzrlib/win32utils.py > /dev/null
bzr --lsprof-file ../knits.callgrind log bzrlib/win32utils.py > /dev/null

And then attach the files here.

Changed in bzr:
status:	New → Incomplete

Revision history for this message

Lukáš Lalinský (luks) wrote on 2007-11-28:

packs.callgrind Edit (224.0 KiB, text/plain)

Revision history for this message

Lukáš Lalinský (luks) wrote on 2007-11-28:

knit.callgrind Edit (189.1 KiB, text/plain)

Revision history for this message

Lukáš Lalinský (luks) wrote on 2007-11-28:

It seems to try to do some bisection, as the most expensive call is _lookup_keys_via_location. But just this function itself is more expensive than the whole command with knits.

Revision history for this message

Robert Collins (lifeless) wrote on 2007-11-28: Re: [Bug 172567] Per-file log is *very* slow with packs

tag packs

Revision history for this message

Robert Collins (lifeless) wrote on 2007-11-28: Re: [Bug 172567] Re: Per-file log is *very* slow with packs

So there are several problems here.

One: iter_entries_prefix is being called; this is used when you do
iter_all_entries on a GraphPrefixAdapter. And Knit.versions(), or
Knit.__len__() both make this call.

So this line:
weave_modifed_revisions = set(file_weave.versions())

is triggering a load of *all* file index history across the entire
repository.

That accounts for 10% of runtime up-front.

The other 67% big up front killer is Repository.get_revision_graph which
is triggering the partial index reads. You can see it make 33730 calls
to get_parents() doing this - and this is more of an issue than the
bisection call overheads; if we had topologically sorted indices we'd
see less fragmentation and less time in bisection.

So in short there are three obvious issues:
- per file log is loading all text indices for the entire repository.
Aka. Knit.versions() is a size-of-history API.
- get_revision_graph() is loading the desired history quite slowly;
perhaps the index layer should grab a native history-following call to
reduce this overhead
- get_revision_graph() is called *twice* by the log code. 33K to 16K
calls into code that does IO is a good win.

I think we should file 3 bugs, all of these things are avoidable or
fixable.

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.