Comment 8 for bug 622566

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 622566] Re: ftp access inefficiency

John Arbash Meinel пишет:
> On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
>> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
>> and you need to use kftp://host/path URL instead of ftp://host/path.
>> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>
>
> I think you can arguably cache any downloaded content for the lifetime
> of the lock in the outer scope. As long as you take care of invalidating
> cached data that you just uploaded. (So if you have something cached by
> 'get' or 'readv()' then it should be invalided/updated by a 'put' call.)

Yes, I'm invalidating the cache on put, and on rename/move calls. But
looking at logs it seems only put matters. Also looking at logs it
seems all I need is to cache pack files, because bzr doesn't seem to
request indices more than 1 time.

> Note that this will still be significantly more inefficient that sftp,
> because you aren't actually requesting a subset. You are just avoiding
> requesting the same content twice. (So if you request 10 bytes 10 times
> in a 1MB file, you'll still download 1MB, but that is certainly better
> than 10MB.)

Based on my measurements now cached ftp and sftp took roughly the same
time to make the full pull. For partial pull of 1 revision (as in the
initial bug report) now it took 53 seconds instead of 4 minutes. And
sftp partial pull of the same data took 62 seconds for me. So they're
actually very close. I think it depends on how data packed actually
into pack file. Also there is noticeable delay in sftp case to
establish the connection at the beginning of transaction (up to 5
seconds, if no more).

> If you do the cache invalidation, I think we would be fine bringing that
> into bzr core.

I'll prepare a patch then.

One question: do you think it's OK to explicitly cache only pack files
(I'm checking the file extension now)? It seems as bad practice to
bring the knowledge about packs into transport layer. Another way may
be to cache only big files, greater than some threshold, e.g. 64KiB or
so, so we won't cache lock info files, branch.conf, last-revision and
pack-names.

--
All the dude wanted was his rug back