Bug #622566 “ftp access inefficiency” : Bugs : Breezy

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-08-23:

#1

Every download of that file via FTP costs 27 seconds. So it's 162 seconds (2.5+ minutes) just to re-download the same file again and again.

Changed in bzr:
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-08-23:

#2

Difference between sftp access to the same branch and ftp access to the same branch to get the full branch (initial branching) is 82s vs 542s.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-08-23: Re: [Bug 622566] [NEW] ftp access inefficiency

#3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/23/2010 1:58 AM, Alexander Belchenko wrote:
> Public bug reported:
>
> Doing a pull of couple revisions over FTP is very slow. Today I've spent
> about 4 minutes to pull 1 revision with changes in 2 files which are not
> very big. After pull new pack in my 2a repo is only 6KB big.
>
> Looking at the .bzr.log I see that bzr downloading the same pack file
> from the FTP server again and again and again:

Just to mention we've known about this for quite some time (I'm a bit
surprised there isn't already a bug).

Basically, we don't support partial requests from an existing file
(readv). and we don't cache the file content between requests. So 10
separate readv requests for 10 bytes each on a 100MB file will download
10*100MB = 1GB just to offer up the 100 bytes requested.

I think the fix is to figure out how to do partial reads, since some ftp
clients do support 'resume', which must have some sort of support for
requesting the content that doesn't start at the start of the file.

That said, our FTP support is 'functional' but *far* from optimal. If
you have any chance to use sftp instead, do so. Even better if you just
use bzr+ssh...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxymbkACgkQJdeBCYSNAAOQYQCfVJUIzcVDTDF1lt9F38sd5VKI
aMYAnikFo46SBNJbKCY6Nmqzs0wF2IPB
=tIj3
-----END PGP SIGNATURE-----

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-08-24:

#4

John A Meinel пишет:
> That said, our FTP support is 'functional' but *far* from optimal. If
> you have any chance to use sftp instead, do so. Even better if you just
> use bzr+ssh...

In this particular case I need FTP for collaborations with others.

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-08-24:

#5

I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
and you need to use kftp://host/path URL instead of ftp://host/path.
It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.

Revision history for this message

John A Meinel (jameinel) wrote on 2010-08-24: Re: [Bug 622566] Re: ftp access inefficiency

#6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
> and you need to use kftp://host/path URL instead of ftp://host/path.
> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>

I think you can arguably cache any downloaded content for the lifetime
of the lock in the outer scope. As long as you take care of invalidating
cached data that you just uploaded. (So if you have something cached by
'get' or 'readv()' then it should be invalided/updated by a 'put' call.)

Note that this will still be significantly more inefficient that sftp,
because you aren't actually requesting a subset. You are just avoiding
requesting the same content twice. (So if you request 10 bytes 10 times
in a 1MB file, you'll still download 1MB, but that is certainly better
than 10MB.)

If you do the cache invalidation, I think we would be fine bringing that
into bzr core.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx0Ni0ACgkQJdeBCYSNAAM0WACgzGNTqP0NBCiKHohYAkvYEGCM
O5gAnA3z/6MJhzseF65s0ULwoGk9IvrN
=TGKE
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2010-08-24:

#7

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
> and you need to use kftp://host/path URL instead of ftp://host/path.
> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>

Note a link like this:
http://cr.yp.to/ftp/retr.html

Which specifies the REST verb. The good and bad is that it lets you
start in the middle of the content. The bad is:

1) There doesn't seem to be a way to 'seek' after reading, or stop the
   existing download without closing the connection. So if you want to
   read 100 bytes at offset 10,000, you can seek to 10,000, but it will
   try to stream the remaining 990,000 bytes back to you.
2) Most of our pack-based 'readv' requests actually request the first
   bytes of the file, because it wants to make sure that the pack magic
   header are present.
   There certainly can be stuff we can do here, especially with caching.
3) Some of this might be decent to do if we got fancier about handling
   the individual ftp connections. You connect for the next data
   transfer, grab some bytes, and then disconnect on that socket only,
   and not the control socket.
   That is about the extent of my knowledge of ftp, though.
4) I did check the docs, and ftplib does support REST:
   http://docs.python.org/library/ftplib.html#ftplib.FTP.retrbinary

Has a 'rest' parameter. It may cause an exception to be raised if
REST is not supported, but we can trap that and retry.

   They also allow you to set the 'maxblocksize'. Which will mean more
   bytes are probably on the wire, but we can close the socket once we
   got enough bytes, etc.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx0OEYACgkQJdeBCYSNAAMYbgCfdLJ43PBFjvyCnbp9yqLcj7BZ
jLcAoJZFGxnVFzizsuJGMrYWJ4IvlsSY
=8pa5
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
> and you need to use kftp://host/path URL instead of ftp://host/path.
> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>

Note a link like this:
  http://cr.yp.to/ftp/retr.html

Which specifies the REST verb. The good and bad is that it lets you
start in the middle of the content. The bad is:

1) There doesn't seem to be a way to 'seek' after reading, or stop the
   existing download without closing the connection. So if you want to
   read 100 bytes at offset 10,000, you can seek to 10,000, but it will
   try to stream the remaining 990,000 bytes back to you.
2) Most of our pack-based 'readv' requests actually request the first
   bytes of the file, because it wants to make sure that the pack magic
   header are present.
   There certainly can be stuff we can do here, especially with caching.
3) Some of this might be decent to do if we got fancier about handling
   the individual ftp connections. You connect for the next data
   transfer, grab some bytes, and then disconnect on that socket only,
   and not the control socket.
   That is about the extent of my knowledge of ftp, though.
4) I did check the docs, and ftplib does support REST:
   http://docs.python.org/library/ftplib.html#ftplib.FTP.retrbinary

Has a 'rest' parameter. It may cause an exception to be raised if
   REST is not supported, but we can trap that and retry.

They also allow you to set the 'maxblocksize'. Which will mean more
   bytes are probably on the wire, but we can close the socket once we
   got enough bytes, etc.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx0OEYACgkQJdeBCYSNAAMYbgCfdLJ43PBFjvyCnbp9yqLcj7BZ
jLcAoJZFGxnVFzizsuJGMrYWJ4IvlsSY
=8pa5
-----END PGP SIGNATURE-----

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-08-25:

#8

John Arbash Meinel пишет:
> On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
>> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
>> and you need to use kftp://host/path URL instead of ftp://host/path.
>> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>
>
> I think you can arguably cache any downloaded content for the lifetime
> of the lock in the outer scope. As long as you take care of invalidating
> cached data that you just uploaded. (So if you have something cached by
> 'get' or 'readv()' then it should be invalided/updated by a 'put' call.)

Yes, I'm invalidating the cache on put, and on rename/move calls. But
looking at logs it seems only put matters. Also looking at logs it
seems all I need is to cache pack files, because bzr doesn't seem to
request indices more than 1 time.

> Note that this will still be significantly more inefficient that sftp,
> because you aren't actually requesting a subset. You are just avoiding
> requesting the same content twice. (So if you request 10 bytes 10 times
> in a 1MB file, you'll still download 1MB, but that is certainly better
> than 10MB.)

Based on my measurements now cached ftp and sftp took roughly the same
time to make the full pull. For partial pull of 1 revision (as in the
initial bug report) now it took 53 seconds instead of 4 minutes. And
sftp partial pull of the same data took 62 seconds for me. So they're
actually very close. I think it depends on how data packed actually
into pack file. Also there is noticeable delay in sftp case to
establish the connection at the beginning of transaction (up to 5
seconds, if no more).

> If you do the cache invalidation, I think we would be fine bringing that
> into bzr core.

I'll prepare a patch then.

One question: do you think it's OK to explicitly cache only pack files
(I'm checking the file extension now)? It seems as bad practice to
bring the knowledge about packs into transport layer. Another way may
be to cache only big files, greater than some threshold, e.g. 64KiB or
so, so we won't cache lock info files, branch.conf, last-revision and
pack-names.

--
All the dude wanted was his rug back

John Arbash Meinel пишет:
> On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
>> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
>> and you need to use kftp://host/path URL instead of ftp://host/path.
>> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
> 
> 
> I think you can arguably cache any downloaded content for the lifetime
> of the lock in the outer scope. As long as you take care of invalidating
> cached data that you just uploaded. (So if you have something cached by
> 'get' or 'readv()' then it should be invalided/updated by a 'put' call.)

Yes, I'm invalidating the cache on put, and on rename/move calls. But
looking at logs it seems only put matters. Also looking at logs it
seems all I need is to cache pack files, because bzr doesn't seem to
request indices more than 1 time.

> Note that this will still be significantly more inefficient that sftp,
> because you aren't actually requesting a subset. You are just avoiding
> requesting the same content twice. (So if you request 10 bytes 10 times
> in a 1MB file, you'll still download 1MB, but that is certainly better
> than 10MB.)

Based on my measurements now cached ftp and sftp took roughly the same
time to make the full pull. For partial pull of 1 revision (as in the
initial bug report) now it took 53 seconds instead of 4 minutes. And
sftp partial pull of the same data took 62 seconds for me. So they're
actually very close. I think it depends on how data packed actually
into pack file. Also there is noticeable delay in sftp case to
establish the connection at the beginning of transaction (up to 5
seconds, if no more).

> If you do the cache invalidation, I think we would be fine bringing that
> into bzr core.

I'll prepare a patch then.

One question: do you think it's OK to explicitly cache only pack files
(I'm checking the file extension now)? It seems as bad practice to
bring the knowledge about packs into transport layer. Another way may
be to cache only big files, greater than some threshold, e.g. 64KiB or
so, so we won't cache lock info files, branch.conf, last-revision and
pack-names.

-- 
All the dude wanted was his rug back

Revision history for this message

John A Meinel (jameinel) wrote on 2010-08-25:

#9

Download full text (3.6 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/25/2010 12:49 AM, Alexander Belchenko wrote:
> John Arbash Meinel пишет:
>> On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
>>> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
>>> and you need to use kftp://host/path URL instead of ftp://host/path.
>>> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>>
>>
>> I think you can arguably cache any downloaded content for the lifetime
>> of the lock in the outer scope. As long as you take care of invalidating
>> cached data that you just uploaded. (So if you have something cached by
>> 'get' or 'readv()' then it should be invalided/updated by a 'put' call.)
>
> Yes, I'm invalidating the cache on put, and on rename/move calls. But
> looking at logs it seems only put matters. Also looking at logs it
> seems all I need is to cache pack files, because bzr doesn't seem to
> request indices more than 1 time.

It will really depend on the action being performed. "bzr log" over the
whole history will make repeated passes to the .rix files. Etc.

>
>> Note that this will still be significantly more inefficient that sftp,
>> because you aren't actually requesting a subset. You are just avoiding
>> requesting the same content twice. (So if you request 10 bytes 10 times
>> in a 1MB file, you'll still download 1MB, but that is certainly better
>> than 10MB.)
>
> Based on my measurements now cached ftp and sftp took roughly the same
> time to make the full pull. For partial pull of 1 revision (as in the
> initial bug report) now it took 53 seconds instead of 4 minutes. And
> sftp partial pull of the same data took 62 seconds for me. So they're
> actually very close. I think it depends on how data packed actually
> into pack file. Also there is noticeable delay in sftp case to
> establish the connection at the beginning of transaction (up to 5
> seconds, if no more).

The numbers will depend a lot on your latency versus bandwidth. I also
haven't actually looked at your plugin yet.

>
>> If you do the cache invalidation, I think we would be fine bringing that
>> into bzr core.
>
> I'll prepare a patch then.
>
> One question: do you think it's OK to explicitly cache only pack files
> (I'm checking the file extension now)? It seems as bad practice to
> bring the knowledge about packs into transport layer. Another way may
> be to cache only big files, greater than some threshold, e.g. 64KiB or
> so, so we won't cache lock info files, branch.conf, last-revision and
> pack-names.
>

The things you want to cache are accessed via 'readv' rather than via
'get' directly. I believe in the FTP transport, it doesn't actually
implement readv, so it falls back to the default behavior which calls
get + seek + read (which works for the Local version, not so well for
everything else).

If you just implemented readv() and then only cached requests coming
from there, that would work pretty well.

Note also that you'll want ...

John Arbash Meinel пишет:
> On 8/25/2010 12:49 AM, Alexander Belchenko wrote:
>> John Arbash Meinel пишет:
>>> On 8/24/2010 4:02 PM, Alexander Belchenko wrote:
>>>> I've coded a simple cached FTP transport on top of standard FTP transport in bzrlib, the plugin is available at lp:~bialix/+junk/kftp
>>>> and you need to use kftp://host/path URL instead of ftp://host/path.
>>>> It works good and indeed reduce the tme for pull and even push. I know it's ad-hoc, but it's better than nothing at all. I think it should be safe to cache packs/indices because they're supposed to be write-once data.
>>>
>>> I think you can arguably cache any downloaded content for the lifetime
>>> of the lock in the outer scope. As long as you take care of invalidating
>>> cached data that you just uploaded. (So if you have something cached by
>>> 'get' or 'readv()' then it should be invalided/updated by a 'put' call.)
>> Yes, I'm invalidating the cache on put, and on rename/move calls. But
>> looking at logs it seems only put matters. Also looking at logs it
>> seems all I need is to cache pack files, because bzr doesn't seem to
>> request indices more than 1 time.
> 
> It will really depend on the action being performed. "bzr log" over the
> whole history will make repeated passes to the .rix files. Etc.

OK, it makes sense.

>>> Note that this will still be significantly more inefficient that sftp,
>>> because you aren't actually requesting a subset. You are just avoiding
>>> requesting the same content twice. (So if you request 10 bytes 10 times
>>> in a 1MB file, you'll still download 1MB, but that is certainly better
>>> than 10MB.)
>> Based on my measurements now cached ftp and sftp took roughly the same
>> time to make the full pull. For partial pull of 1 revision (as in the
>> initial bug report) now it took 53 seconds instead of 4 minutes. And
>> sftp partial pull of the same data took 62 seconds for me. So they're
>> actually very close. I think it depends on how data packed actually
>> into pack file. Also there is noticeable delay in sftp case to
>> establish the connection at the beginning of transaction (up to 5
>> seconds, if no more).
> 
> The numbers will depend a lot on your latency versus bandwidth. I also
> haven't actually looked at your plugin yet.

I hope I can assume that my ftp vs sftp access to the same server over 
the same network has roughly the same latency/bandwidth.

The main part of my plugin here:
http://bazaar.launchpad.net/~bialix/%2Bjunk/kftp/annotate/head:/cached_ftp.py
it wraps several methods of FtpTransport to insert cache.

Actually it seems that creating protocol decorator a-la "cached+ftp://" 
would be better, but current implementation is mostly quick hack and 
proof of concept.

>>> If you do the cache invalidation, I think we would be fine bringing that
>>> into bzr core.
>> I'll prepare a patch then.
>>
>> One question: do you think it's OK to explicitly cache only pack files
>> (I'm checking the file extension now)? It seems as bad practice to
>> bring the knowledge about packs into transport layer. Another way may
>> be to cache only big files, greater than some threshold, e.g. 64KiB or
>> so, so we won't cache lock info files, branch.conf, last-revision and
>> pack-names.
>>
> 
> The things you want to cache are accessed via 'readv' rather than via
> 'get' directly. I believe in the FTP transport, it doesn't actually
> implement readv, so it falls back to the default behavior which calls
> get + seek + read (which works for the Local version, not so well for
> everything else).

I'll take a look.

If I understand correctly I should implement _readv method for 
FtpTransport to override following default implementation:

def _readv(self, relpath, offsets):
         """Get parts of the file at the given relative path.

:param relpath: The path to read.
         :param offsets: A list of (offset, size) tuples.
         :return: A list or generator of (offset, data) tuples
         """
         if not offsets:
             return

fp = self.get(relpath)
         return self._seek_and_read(fp, offsets, relpath)

This implementation uses transport.get() as I can see.

Does bzrlib always uses readv and never get? Or readv is used only for 
partial reads from pack/indices files? In the latter case indeed I 
should insert caching in the _readv(), and not into get().

Am I understand this correctly?

> If you just implemented readv() and then only cached requests coming
> from there, that would work pretty well.

> Note also that you'll want to check into the append code a bit, too.
> Sometimes we'll readv() from something we just appended to.

append method invalidates the cache at the moment.

> I think that would be a better option than trying to find a heuristic
> based on file size.

Does "that" mean "readv"?

Thank you.
-- 
All the dude wanted was his rug back

Revision history for this message

John A Meinel (jameinel) wrote on 2010-08-25:

#11

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
> The main part of my plugin here:
> http://bazaar.launchpad.net/~bialix/%2Bjunk/kftp/annotate/head:/cached_ftp.py
> it wraps several methods of FtpTransport to insert cache.
>
> Actually it seems that creating protocol decorator a-la "cached+ftp://"
> would be better, but current implementation is mostly quick hack and
> proof of concept.

The only specific issue I have is that it is all cached in memory, and
we often have very large .pack files. (robert had a repository that was
16GB in size, IIRC.) My largest bzr repo pack is 54MB.

So I would look into using 'tempfile' code to put content into a local
temporary file, and then return the data from that. You could probably
put small content in memory (say <64k or something). The concerns would be:
a) If you hold the files open (using TemporaryFile, which deletes when
    closed), then you may run out of file descriptors (unlikely, as we
    do try to keep the number of files small, maybe <100)
b) If you don't, then you have to figure out when to delete the local
    content.
    One option is to actually not delete the local content, considering
    pack files are named by md5sums. However, you'd want some sort of
    invalidation, because when we repack the repo will delete those old
    files, but you wouldn't know about that locally.

Probably best to stick with TemporaryFile.

>

> If I understand correctly I should implement _readv method for
> FtpTransport to override following default implementation:

Correct.

> Does bzrlib always uses readv and never get? Or readv is used only for
> partial reads from pack/indices files? In the latter case indeed I
> should insert caching in the _readv(), and not into get().
>
> Am I understand this correctly?

We always use readv() to read anything that we can read a subset of.
(.pack and indices). We standardized on '.get_bytes()' for stuff like
meta control files (branch-format, branch.conf, etc). Though the default
implementation of .get_bytes() calls .get().read(). (This is because
.get() existed before .get_bytes()).
>
>> If you just implemented readv() and then only cached requests coming
>> from there, that would work pretty well.
>
>> Note also that you'll want to check into the append code a bit, too.
>> Sometimes we'll readv() from something we just appended to.
>
> append method invalidates the cache at the moment.
>
>> I think that would be a better option than trying to find a heuristic
>> based on file size.
>
> Does "that" mean "readv"?
>
> Thank you.

Yes.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx1OUYACgkQJdeBCYSNAAOJPQCcDMwbei9xhZOuLD+ildcJcnHL
SE4Anj98zc7826v92YQwC8m7ZhnksyjV
=l6ft
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
> The main part of my plugin here:
> http://bazaar.launchpad.net/~bialix/%2Bjunk/kftp/annotate/head:/cached_ftp.py
> it wraps several methods of FtpTransport to insert cache.
> 
> Actually it seems that creating protocol decorator a-la "cached+ftp://" 
> would be better, but current implementation is mostly quick hack and 
> proof of concept.

The only specific issue I have is that it is all cached in memory, and
we often have very large .pack files. (robert had a repository that was
16GB in size, IIRC.) My largest bzr repo pack is 54MB.

So I would look into using 'tempfile' code to put content into a local
temporary file, and then return the data from that. You could probably
put small content in memory (say <64k or something). The concerns would be:
 a) If you hold the files open (using TemporaryFile, which deletes when
    closed), then you may run out of file descriptors (unlikely, as we
    do try to keep the number of files small, maybe <100)
 b) If you don't, then you have to figure out when to delete the local
    content.
    One option is to actually not delete the local content, considering
    pack files are named by md5sums. However, you'd want some sort of
    invalidation, because when we repack the repo will delete those old
    files, but you wouldn't know about that locally.

Probably best to stick with TemporaryFile.

>

> If I understand correctly I should implement _readv method for 
> FtpTransport to override following default implementation:

Correct.

> Does bzrlib always uses readv and never get? Or readv is used only for 
> partial reads from pack/indices files? In the latter case indeed I 
> should insert caching in the _readv(), and not into get().
> 
> Am I understand this correctly?

We always use readv() to read anything that we can read a subset of.
(.pack and indices). We standardized on '.get_bytes()' for stuff like
meta control files (branch-format, branch.conf, etc). Though the default
implementation of .get_bytes() calls .get().read(). (This is because
.get() existed before .get_bytes()).
> 
>> If you just implemented readv() and then only cached requests coming
>> from there, that would work pretty well.
> 
>> Note also that you'll want to check into the append code a bit, too.
>> Sometimes we'll readv() from something we just appended to.
> 
> append method invalidates the cache at the moment.
> 
>> I think that would be a better option than trying to find a heuristic
>> based on file size.
> 
> Does "that" mean "readv"?
> 
> Thank you.

Yes.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx1OUYACgkQJdeBCYSNAAOJPQCcDMwbei9xhZOuLD+ildcJcnHL
SE4Anj98zc7826v92YQwC8m7ZhnksyjV
=l6ft
-----END PGP SIGNATURE-----

Revision history for this message

Gary van der Merwe (garyvdm) wrote on 2010-08-26:

#12

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25/08/2010 07:49, Alexander Belchenko wrote:
> Yes, I'm invalidating the cache on put, and on rename/move calls. But
> looking at logs it seems only put matters.

I would say you should rename the cache when the file is renamed. So if
you rename a b then you should do:
cache['b'] = cache['a']
del cache['a']

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkx2L0kACgkQd/3EdwGKOh3CzgCeNIwfn4XJ+bwygPYsbyWm8hCT
jJoAn00AhZAuJicPdJhHxRfllJuhSxam
=YIAb
-----END PGP SIGNATURE-----

Revision history for this message

Alexander Belchenko (bialix) wrote on 2010-08-26:

#13

Gary van der Merwe пишет:
> On 25/08/2010 07:49, Alexander Belchenko wrote:
>> Yes, I'm invalidating the cache on put, and on rename/move calls. But
>> looking at logs it seems only put matters.
>
> I would say you should rename the cache when the file is renamed. So if
> you rename a b then you should do:
> cache['b'] = cache['a']
> del cache['a']

I'd prefer to invalidate the cache instead of moving content. Because at
this point I don't know will operation be successful or not.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2010-08-26:

#14

>>>>> Gary van der Merwe <email address hidden> writes:

    > On 25/08/2010 07:49, Alexander Belchenko wrote:
    >> Yes, I'm invalidating the cache on put, and on rename/move calls. But
    >> looking at logs it seems only put matters.

    > I would say you should rename the cache when the file is renamed. So if
    > you rename a b then you should do:
    > cache['b'] = cache['a']
    > del cache['a']

Good idea as it will partly address the repacks. Well, for people using
the plugin that is ;)

I think using temp files is the safest bet to cover all cases anyway
even if it's less efficient from one bzr invocation to the other.

Jelmer Vernooij (jelmer) on 2011-02-01

tags:

added: ftp

Vincent Ladeuil (vila) on 2011-02-01

tags:

added: performance

Jelmer Vernooij (jelmer) on 2017-11-08

tags:

added: check-for-breezy

Jelmer Vernooij (jelmer) on 2017-11-11

tags:	removed: check-for-breezy
Changed in brz:
status:	New → Triaged
importance:	Undecided → Low

Jelmer Vernooij (jelmer) on 2018-05-11

Changed in brz:
status:	Triaged → Won't Fix

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Confirmed	High	Unassigned
	Breezy	Won't Fix	Low	Unassigned

Breezy

ftp access inefficiency

Bug Description

Related branches

Other bug subscribers

Remote bug watches