.cix thrashing causes us to re-download the whole index multiple times

Bug #402623 reported by John A Meinel on 2009-07-21
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Critical
John A Meinel

Bug Description

being split off from bug #402114

When a .cix index gets to be larger than ~100k entries, (more than 1000 btree pages) we no longer can fit everything into the local cache. Because the access pattern is essentially 'random' this destroys our 'hit rate'.

When doing a fetch, we generally have to access the .cix 4-5 times. (read root nodes, read nodes underneath them, read root nodes for the tree shape, read nodes underneath them, etc.)

This means that when doing a fresh branch of a large project, we will re-read the entire .cix for the largest pack files up to 5 times. (in the case of LP, this is about 10MB+ of data that we re-read 5 times.) At 50MB of data, that is 50% of the size of an optimized pack file for the entire history (106MB).

The easiest thing to do is to just increase the buffering for .cix indexes.

Related branches

John A Meinel (jameinel) on 2009-07-21
Changed in bzr:
importance: Undecided → High
status: New → Triaged
Changed in bzr:
importance: High → Critical
Robert Collins (lifeless) wrote :

I'm not sure what to do about this :(.

2009/8/14 Robert Collins <email address hidden>:
> I'm not sure what to do about this :(.

Can we let the cache grow to fit at least one whole index? Thrashing
into swap is probably better than thrashing over the network.

Is this a release blocker?

--
Martin <http://launchpad.net/~mbp/>

On Fri, 2009-08-14 at 03:15 +0000, Martin Pool wrote:
> 2009/8/14 Robert Collins <email address hidden>:
> > I'm not sure what to do about this :(.
>
> Can we let the cache grow to fit at least one whole index? Thrashing
> into swap is probably better than thrashing over the network.
>
> Is this a release blocker?

I don't think so because:
 - few projects are _really big_
 - those that aren't should be fine.

-Rob

Christian Reis (kiko) wrote :

Is there a workaround that projects that run into this can apply? For release-noting if nothing else.

John A Meinel (jameinel) wrote :

So the trivial fix is:
=== modified file 'bzrlib/btree_index.py'
--- bzrlib/btree_index.py 2009-08-13 19:56:26 +0000
+++ bzrlib/btree_index.py 2009-08-14 18:55:59 +0000
@@ -46,7 +46,7 @@
 _PAGE_SIZE = 4096

 # 4K per page: 4MB - 1000 entries
-_NODE_CACHE_SIZE = 1000
+_NODE_CACHE_SIZE = 100000

One could easily put this into a plugin.

Also note that this is only true when reading over a 'dumb' transport. The smart server code reads things locally and streams the content over the wire, the indexes are not read directly.

A more involved option would be to set up a special case for .cix because we know we lose 'locality' there, and to only bump up the node cache size for cix.

2009/8/15 John A Meinel <email address hidden>:
> So the trivial fix is:
> === modified file 'bzrlib/btree_index.py'
> --- bzrlib/btree_index.py       2009-08-13 19:56:26 +0000
> +++ bzrlib/btree_index.py       2009-08-14 18:55:59 +0000
> @@ -46,7 +46,7 @@
>  _PAGE_SIZE = 4096
>
>  # 4K per page: 4MB - 1000 entries
> -_NODE_CACHE_SIZE = 1000
> +_NODE_CACHE_SIZE = 100000

That's the kind of thing I'd look at putting in
-Dindex_cache_size=100000 (which would require a small patch to
support key/value debug flags); obviously it's inferior to just
working but it's more feasible for most users than a patch or even a
plugin.

--
Martin <http://launchpad.net/~mbp/>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> 2009/8/15 John A Meinel <email address hidden>:
>> So the trivial fix is:
>> === modified file 'bzrlib/btree_index.py'
>> --- bzrlib/btree_index.py 2009-08-13 19:56:26 +0000
>> +++ bzrlib/btree_index.py 2009-08-14 18:55:59 +0000
>> @@ -46,7 +46,7 @@
>> _PAGE_SIZE = 4096
>>
>> # 4K per page: 4MB - 1000 entries
>> -_NODE_CACHE_SIZE = 1000
>> +_NODE_CACHE_SIZE = 100000
>
> That's the kind of thing I'd look at putting in
> -Dindex_cache_size=100000 (which would require a small patch to
> support key/value debug flags); obviously it's inferior to just
> working but it's more feasible for most users than a patch or even a
> plugin.
>

You could make it "-Dbig_index_cache" and then set it to something
ridiculously high.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqGvpcACgkQJdeBCYSNAAPangCgnqyH6l0G6FZgIwLHrImOcwTM
F6gAnigtp4dgM0810lMmPq4q2ZKjFfmC
=AtYU
-----END PGP SIGNATURE-----

Changed in bzr:
assignee: nobody → Ian Clatworthy (ian-clatworthy)
Ian Clatworthy (ian-clatworthy) wrote :

There are several issues here:

1. Can we tell via tools what repositories will be impacted by this, e.g. by running a plugin like repository-details?

2. What operations are impacted? Just branch or also things like pull, update and imports?

3. Solution design. If users aren't going to know that this will impact them until after they do the branch, then adding a command line flag (vs some sort of auto-detection) doesn't sound very useful to me.

It's very easy to extend the BTreeGraphIndex constructor with an optional parameter being the index size. The hard bit is passing that parameter in from a higher layer, given the way these indexes are currently instantiated by the repository layer as best I can tell. And how would the higher layer decide what size is appropriate?

Maybe we need a high level setting of "optimise for memory vs speed" as Windows does? Assuming memory is quite abundant these days, perhaps picking a larger default cache here (40MB vs 4MB) is fine. Those on limited memory laptops could add something to their bazaar.conf saying something like "optimize-for = limited-memory".

Ian Clatworthy (ian-clatworthy) wrote :

I'll try John's patch on OOo and see what the impact is w.r.t speed vs memory usage of local branching.

Download full text (3.2 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> There are several issues here:
>
> 1. Can we tell via tools what repositories will be impacted by this,
> e.g. by running a plugin like repository-details?

heads -n5 .bzr/repository/indices/*.cix

would be sufficient.

Basically, anything with more than 1000 pages is going to 'thrash' a
bit. Some more than others.

B+Tree Graph Index 2
node_ref_lists=0
key_elements=1
len=99083
row_lengths=1,5,825

^- 99,083 is the number of keys, 825 is the total number of pages. Note
that this is an old bzr.dev repo with everything in one pack. And we are
already at 825. So getting > 1000 is pretty easy.

For Launchpad, I get:
len=398273
row_lengths=1,23,3671

^- So we have 3.7k chk pages. Meaning the cache hit rate is going to be
1/3.7 ~ 27%.

>
> 2. What operations are impacted? Just branch or also things like pull,
> update and imports?

Pull and update will generally only read a fraction of the chk pages,
and thus aren't very likely to be heavily impacted.

This also only really effects dumb-transport users. It does effect local
operations (and the server side of remote ops), but the actual effect is
just re-reading a local page. Slower than it has to be, but not
re-downloading over a low-bandwidth connection.

>
> 3. Solution design. If users aren't going to know that this will impact
> them until after they do the branch, then adding a command line flag (vs
> some sort of auto-detection) doesn't sound very useful to me.
>
> It's very easy to extend the BTreeGraphIndex constructor with an
> optional parameter being the index size. The hard bit is passing that
> parameter in from a higher layer, given the way these indexes are
> currently instantiated by the repository layer as best I can tell. And
> how would the higher layer decide what size is appropriate?

Arguably we could set chk pages to have 'infinite' size. When a btree
reads its root page, it knows how many pages are available. So it could do:

if self._cache_everything:
  self._cache.resize(self._row_offsets[-1]) # + 1?

>
> Maybe we need a high level setting of "optimise for memory vs speed" as
> Windows does? Assuming memory is quite abundant these days, perhaps
> picking a larger default cache here (40MB vs 4MB) is fine. Those on
> limited memory laptops could add something to their bazaar.conf saying
> something like "optimize-for = limited-memory".
>

It would need to be something passed from the repository when it
constructs the indexes.

Around line 1677 of pack_repo.py:
            if self.chk_index is not None:
                chk_index = self._make_index(name, '.cix')

That is the point at which we end up calling
"self._index_class(transport, index, index_size)" and the point where we
know that we are dealing with a chk index rather than 'just any' index.

So you could pass "make_index" a "should cache everything flag" which
could then be passed down into the BTreeIndex constructor.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqf41EACgkQJdeBCYSNAANlggCgh4UhRK0E6gZNu8uziJd46c0e
mDAAoLlQ5348Yr0rW7PUGc04R...

Read more...

John A Meinel (jameinel) on 2009-09-09
Changed in bzr:
assignee: Ian Clatworthy (ian-clatworthy) → John A Meinel (jameinel)
John A Meinel (jameinel) on 2009-09-09
Changed in bzr:
milestone: none → 2.0.1
John A Meinel (jameinel) wrote :

There is a patch up for review.
In testing this, I went ahead and used bzr.dev to download http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel
I set -Dhttp and then used some silly grep hacks to check Content-Length headers:
grep -A 17 "http.*cix" .bzr.log | grep Content-Length
< Content-Length: 4096
< Content-Length: 94208
< Content-Length: 15846927
< Content-Length: 433
< Content-Length: 17119
< Content-Length: 583
< Content-Length: 298
< Content-Length: 610
< Content-Length: 12349440
< Content-Length: 12004988
< Content-Length: 11701226
< Content-Length: 1092692

Which in total is 50.7MiB

Versus just 'grep Content-Length' which was:
 180.5 MiB

So out of 180.5 MiB downloaded, 28% of that was the .cix headers being downloaded 4+ times. (The last one is probably reading the parent_id,basename=>file_id index pages, which are significantly less than the full content.)

In comparison, doing the same action with my patch resulted in:
< Content-Length: 4096
< Content-Length: 94208
< Content-Length: 15846927
< Content-Length: 433
< Content-Length: 17119
< Content-Length: 583
< Content-Length: 298
< Content-Length: 610

    15.2 MiB / 143.1 MiB

Peak memory did increase to about 650MiB during the transfer. Though I'm trying
to address that sort of issue with different patches.

Changed in bzr:
status: Triaged → Fix Committed
Andrew Bennetts (spiv) wrote :

This fix is now merged to both lp:bzr and lp:bzr/2.0

Changed in bzr:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers