loggerhead's revision graph cache may impair performance improvements

Bug #517209 reported by Max Kanat-Alexander
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
loggerhead
Triaged
High
Unassigned
loggerhead-breezy
Triaged
High
Unassigned

Bug Description

So, for the last few days, I've been investigating how and why we use the revision graph cache, to determine whether or not it can be removed.

There are only two purposes we use the cache for:

1. Translating revision ids to revnos. (This is what _rev_indices is for.)
2. Finding out where a revision id was merged in, and where it was merged from. We also sometimes specifically want to know when it was merged into mainline. (These uses are what _rev_info is for, and they're all essentially one use--figuring out the merge map.)

It turns out that the data that we need for both of those operations is actually *already* being cached by bzrlib:

1. revno info is already cached by get_revision_id_to_revno_map, and bzrlib already has a better idea internally as to when that needs to be updated than we ever would. Also, I imagine that it's a relatively small data structure, and so we could safely cache it for hundreds of branches.

2. Calling branch.iter_merge_sorted_revisions the first time is slow, but calling it a second time is nearly instantaneous (less than a few ms on the launchpad devel branch). This method is where we spend about 2/3 of our time when actually generating the revision graph cache.

So what I was thinking is:

1. Instead of caching all of this data twice, essentially, we should just cache the branch objects themselves in the LRUCache. Since loggerhead doesn't do any write operations to the branches, I imagine it'd be fairly safe to use the same branch object across all our threads.

2. If there turns out to be any performance problem once this is done, then we create three simple additional caches--just a revid-to-revno map, a "where I was merged in" map, and a "where I was merged from" map. These should be sufficiently small data structures that we could store them for 100+ branches. (It's also possible that we would just have to cache them per-request instead of persistently.) Also, they would be far simpler to use and understand than the current cache data structures.

3. We can remove any internal disk caching, and encourage people to use bzr-historycache if they want a speedier loggerhead. bzr-historycache actually caches iter_merge_sorted_revisions and the revid-to-revno map (the exact two things we need), so given the above refactoring, I don't see anything we'd need our internal disk caches for. We could even add a switch called --use-historycache if we wanted to automatically switch the cache "on" for every branch without having to edit our branch.conf, locations.conf, or bazaar.conf.

4. We could possibly work on bzr-historycache to make it able to append revisions to the cache instead of regenerating it fully whenever the graph changes. (There's already some preliminary code in historycache to account for when it's able to do this, but it can't do it yet.)

Basically, out of this we'd get simplicity and speed by mostly removing code, though altogether it would be a fairly involved project.

Revision history for this message
Max Kanat-Alexander (mkanat) wrote :

Okay, so I talked a bit with mwhudson about that, and if it turns out that branches aren't thread-safe and can't be made thread-safe, then we can create the three simplified caches described in step 2, but only lazily, when they're needed. (That would be a big improvement since we only need the full-graph ones during RevLogUI and RevisionUI.)

summary: - remove/refactor loggerhead's revision graph cache
+ loggerhead's revision graph cache may impair performance improvements
Changed in loggerhead:
status: New → Triaged
importance: Undecided → High
Jelmer Vernooij (jelmer)
Changed in loggerhead-breezy:
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.