Bug #423804 “Failure to saturate bandwidth downloading lp:bzr” : Bugs : Bazaar

Revision history for this message

John A Meinel (jameinel) wrote on 2009-09-03:

#1

Bandwidth usage display Edit (3.1 KiB, image/png)

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#2

get_stream_babune.png Edit (3.7 KiB, image/png)

I brought this up a bit in Mooloolaba. As a guess, I'm thinking that the pack-on-the-fly code may be slowing down the fetch. Even worse, it may be interacting poorly with the Nagle algorithm, so that once the pack has completed, the TCP auto-negotiation has decided that it shouldn't send as much content.

It is also possible that we have some server side issues, where we aren't keeping the write buffer full. However, I would suspect the source side before I suspect the target side.

In contrast I'm uploading a picture of network throughput for iterating over 'get_stream()' from Babune to my machine. Note that it is *completely* flat. Something is throttling, though, as it transferred at a steady 100kB/s when I should have at least 300kB/s of download bandwidth.

I'll try a few more tests, including checking to see what the throttling issue is, what 'branch' looks like from babune, and whether 'get_stream()' from launchpad is also as flat.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#3

get_stream_lp.png Edit (6.3 KiB, image/png)

I also wanted to mention that the 'dips' that are shown there are specifically in the transition from signatures+revisions+inventories => the first chk_bytes pages => the second chk_bytes pages => texts.

I did test get_stream() against launchpad, and this is what I saw.

Note that the peak here is about 3x that of streaming from babune (so we are hitting the bandwidth cap on my system).

The major gaps are at the same point, though here the downtime is greater. I wonder if we are hitting CPU limits on launchpad server side. I also don't have any idea of launchpad code hosting's current load, or whether there are other limiting factors between here and launchpad.

What is also very strange is the 'ramp up' that is seen repeatedly. Where you get a severe drop, followed by a fast ramp, slow ramp, fast ramp, peak and then dropoff again.

I'm also not sure what bzr version launchpad is running, I'm fairly confident that I'm connecting to bzr-2.0.2 on Babune.

So it *could* be that just saturating my ISP's bandwidth because LP has so much more capacity, is causing all sorts of networking craziness. I'd like to test streaming from LP to babune, since that is a much lower latency connection, but I don't know how I would grab a graph of the bandwidth there.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#4

get_stream_chinstrap_openssh.png Edit (3.6 KiB, image/png)

More graphs. This one is the bandwidth of downloading all of bzr from a freshly packed repository on chinstrap, rather than from bazaar.launchpad.net.
Note that while still a bit 'spikey' it stays much closer to peak bandwidth for the whole time.
The other difference here is that I'm using openssh as my ssh provider, rather than paramiko. I'll upload one with paramiko in a second. So there are still a fair amount of variables. I'll try a copy of production's exact repo (rather than freshly packed) and see if this changes anything.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#5

get_stream_chinstrap_paramiko.png Edit (3.8 KiB, image/png)

This is connecting to chinstrap via paramiko. It seems a bit spikier, but the total download time is about the same.

Note also that I think the X and Y axis should be identical for all the recent graphs.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#6

get_stream_chinstrap_paramiko_unpacked.png Edit (3.5 KiB, image/png)

This next graph is 'get_stream()' from chinstrap using an 'lftp mirror' copy of bazaar.launchpad.net's bzr repository.
Unfortunately there was a tiny data spike, which caused the Y axis to resize, but the X axis is the same. The peak seen in this graph is the same as the peaks elsewhere. So we again manage to hit peak bandwidth, but it is much choppier.

Looking at chinstraps.bzr.log I see *lots* of:
263.431 creating new compressed block on-the-fly in 0.000s 2092406 bytes => 10517 bytes

This indicates that the thrashing we noticed before really does decrease throughput. We fixed size-on-disk by having the client recompress the stream.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#7

get_stream_chinstrap_paramiko_unpacked_unordered.png Edit (3.3 KiB, image/png)

This is another chinstrap => local using the mirror of production.
In this case, I have a small patch to the source codebase. Specifically:
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2009-10-23 17:27:45 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-11-17 21:41:18 +0000
@@ -1021,7 +1021,7 @@
         super(GroupCHKStreamSource, self).__init__(from_repository, to_format)
         self._revision_keys = None
         self._text_keys = None
- self._text_fetch_order = 'groupcompress'
+ self._text_fetch_order = 'unordered'
         self._chk_id_roots = None
         self._chk_p_id_roots = None

In other words, it doesn't change the signatures/repository/inventory or chk streaming, but it changes the text streaming to be done 'unordered' rather than trying to recompute the 'groupcompress' "proper" order.

This is better for networking at least. The runtime dropped from 350s down to 260->290s. And most of the "creating new compressed block on-the-fly" lines are gone.

I don't have a good feel for what this will mean client-side for final-disk size. As it will be trying to collapse groups, but it won't have the data in an optimal order.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-11-17: Re: [Bug 423804] Re: Failure to saturate bandwidth downloading lp:bzr

#8

John A Meinel wrote:
[...]
> I'm also not sure what bzr version launchpad is running, I'm fairly

My bzr ping plugin says bzr+ssh://bazaar.launchpad.net/ is running 2.0.0.

-Andrew.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-17:

#9

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> John A Meinel wrote:
> [...]
>> I'm also not sure what bzr version launchpad is running, I'm fairly
>
> My bzr ping plugin says bzr+ssh://bazaar.launchpad.net/ is running
> 2.0.0.
>
> -Andrew.
>

Yeah, I got the same info when I tried dumping the arg dict from the
protocol request parser.

staging is also running 2.0.0, and mwhudson confirmed that it is likely
to be true.

However, chinstrap is running 2.0.2, and some tests with bzr.dev on that
machine showed the same results.

sorting groupcompress on the source is part of the problem, slowdowns
from having a non-optimally packed source repo is another.

John=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksDL6cACgkQJdeBCYSNAAMywgCdFPmF7x6I3p8ZwpzSEMiSrP7X
C1AAoLYu6vDSxNK7X1+1QhI+CbFWoyra
=JPjm
-----END PGP SIGNATURE-----

Revision history for this message

Robert Collins (lifeless) wrote on 2009-11-18:

#10

On Tue, 2009-11-17 at 23:20 +0000, John A Meinel wrote:

> sorting groupcompress on the source is part of the problem, slowdowns
> from having a non-optimally packed source repo is another.

I think its all going to be tied into windowing. I encourage the reading
thread / circular buffer experiment ;)

-Rob

Revision history for this message

John A Meinel (jameinel) wrote on 2009-11-20:

#11

stream_mem_test.py Edit (2.8 KiB, text/x-python)

I'm just attaching the script I'm using on the client for testing. This is the 'no-op' client so that we hopefully are only showing the server's ability to keep the pipe full.

Long term we need to optimize both sides. And make sure that the server keeps its end full, and the client is clearing out its end as fast as possible. But splitting the testing should make it easier to get there.

The script is adapted from my work on testing memory consumption, so it depends on Meliae, etc. But we can easily strip that out.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-13:

#12

ssh_cat.png Edit (3.5 KiB, image/png)

I've determined that at least some of this is just my connectivity to Chinstrap. I'm attaching the network results for

ssh chinstrap cat .../.bzr/repository/xxx.pack > local.pack

Which is a 31MB file.

The whole transmission is very "peaky". I've also confirmed that it isn't just disk issues by running 2 times.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-13:

#13

wget.png Edit (2.3 KiB, image/png)

I then tried a 'wget' from a bazaar.launchpad.net branch of bzr.dev. And I got similar results. Sorry the scale changed, something peaked above my actual bandwith cap...

Anyway, you can still see that it cycles up and down near the bandwidth max, and occasionally drops off considerably.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-13:

#14

Note that I tried this on Babune, which has a significantly higher bandwidth and lower latency, and it seemed to generate a fairly stable 1.9+MB/s stream. So it might be my home network (I've had some flakiness with my wireless router, though I got the same results on the wired side of the router.)

I do wonder if this is a network configuration issue, though. As I've seen downloads from Steam, etc, that are just a rock-solid 300kB/s. Or maybe it is just because it is going transcontinentally?

Revision history for this message

Robert Collins (lifeless) wrote on 2009-12-13:

#15

ssh can introduce artifacts itself, as it has its own windowing going
on.

Interacting better with this needs some study. Also stdio redirection
may not keep as clean a buffer as needed ;).

I do think we need to test the impact of focusing on having a window
window for the tcp stack as a specific thing - and I'd do that before
worrying about the server side at all.

-Rob

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-14:

#16

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> ssh can introduce artifacts itself, as it has its own windowing going
> on.
>
> Interacting better with this needs some study. Also stdio redirection
> may not keep as clean a buffer as needed ;).
>
> I do think we need to test the impact of focusing on having a window
> window for the tcp stack as a specific thing - and I'd do that before
> worrying about the server side at all.
>
> -Rob
>

Sure. Though I'm seeing some of this via 'wget' as well, which would not
be using SSH. :)

For now, I'm not going to be spending too much more time on this. I may
try some small things here and there, but it looks like my network is
screwy enough that figuring out the signal vs noise is going to be a
pain. (I just saw the same 'cyclical' issues when watching a YouTube
video, so it is possibly an issue with how my ISP is doing throttling?)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksmxC8ACgkQJdeBCYSNAANuWgCeKzp0Ypsq9YYg236RX5exxLmA
wwgAn09hquArMLTcx1weelnGu6Dq+BEQ
=rO7X
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-22:

#17

So I did a bit more debugging, since I was looking at this code a bit while trying to address the os.write code.

Anyway, it looks like the ssh process allows us about 1.6MB of in-transit data before it starts blocking.

This was detected by having a canned request return a lot of bulk data, and then having a trivial client that just reads the response 64kB at a time.

My network is still a bit strange, but I can see that the server-side claims to write as many as 30 64kB chunks that the client has not claimed to have read yet. This was done by breaking up a large request into 64kB chunks, and calling out.write() on each one, and timing how long it took, while at the same time, the local client reports when it reads 64kB.

This used ssh's stderr multiplexing to get the messages on stderr locally. Which I'm sure isn't perfectly aligned between processes, but it should be at least close.

Almost all of the write calls look like: wrote 65536 sub bytes in 0.000s
In that it takes <= 3ms for a write. Every so often you get:
wrote 65536 sub bytes in 2.143s

Which I assume means that ssh is blocking waiting for some of the buffer to clear up.

This means we have something less than 2MB of buffering that we get for bzr+ssh. (I assume this would be different with Launchpad's Twisted implementation.) Though it may also depend on the Window, etc settings.

Though according to Wireshark my peak "Win" is only about 260,000 bytes. (Wireshark also tells me I get a fair number of retransmissions, etc. Which I assume is because of something with my network, and not something bzr could do anything about.)

Anyway, it would appear that there isn't much to be done for bzr+ssh, since as near as I can tell, ssh is already handling the buffering for us. I wonder if it would help to decrease the 1MB internal buffer, so that we aren't too close to the limit of the ssh subprocess. Conceptually, that would allow us to multiplex a bit more. So that we don't wait and have the buffer empty before we then supply a huge content chunk that ends up blocking.

Something to consider, at least.