-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Clatworthy wrote: > There are several issues here: > > 1. Can we tell via tools what repositories will be impacted by this, > e.g. by running a plugin like repository-details? heads -n5 .bzr/repository/indices/*.cix would be sufficient. Basically, anything with more than 1000 pages is going to 'thrash' a bit. Some more than others. B+Tree Graph Index 2 node_ref_lists=0 key_elements=1 len=99083 row_lengths=1,5,825 ^- 99,083 is the number of keys, 825 is the total number of pages. Note that this is an old bzr.dev repo with everything in one pack. And we are already at 825. So getting > 1000 is pretty easy. For Launchpad, I get: len=398273 row_lengths=1,23,3671 ^- So we have 3.7k chk pages. Meaning the cache hit rate is going to be 1/3.7 ~ 27%. > > 2. What operations are impacted? Just branch or also things like pull, > update and imports? Pull and update will generally only read a fraction of the chk pages, and thus aren't very likely to be heavily impacted. This also only really effects dumb-transport users. It does effect local operations (and the server side of remote ops), but the actual effect is just re-reading a local page. Slower than it has to be, but not re-downloading over a low-bandwidth connection. > > 3. Solution design. If users aren't going to know that this will impact > them until after they do the branch, then adding a command line flag (vs > some sort of auto-detection) doesn't sound very useful to me. > > It's very easy to extend the BTreeGraphIndex constructor with an > optional parameter being the index size. The hard bit is passing that > parameter in from a higher layer, given the way these indexes are > currently instantiated by the repository layer as best I can tell. And > how would the higher layer decide what size is appropriate? Arguably we could set chk pages to have 'infinite' size. When a btree reads its root page, it knows how many pages are available. So it could do: if self._cache_everything: self._cache.resize(self._row_offsets[-1]) # + 1? > > Maybe we need a high level setting of "optimise for memory vs speed" as > Windows does? Assuming memory is quite abundant these days, perhaps > picking a larger default cache here (40MB vs 4MB) is fine. Those on > limited memory laptops could add something to their bazaar.conf saying > something like "optimize-for = limited-memory". > It would need to be something passed from the repository when it constructs the indexes. Around line 1677 of pack_repo.py: if self.chk_index is not None: chk_index = self._make_index(name, '.cix') That is the point at which we end up calling "self._index_class(transport, index, index_size)" and the point where we know that we are dealing with a chk index rather than 'just any' index. So you could pass "make_index" a "should cache everything flag" which could then be passed down into the BTreeIndex constructor. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqf41EACgkQJdeBCYSNAANlggCgh4UhRK0E6gZNu8uziJd46c0e mDAAoLlQ5348Yr0rW7PUGc04RroJgU4X =ln8G -----END PGP SIGNATURE-----