pull from launchpad is slow

Bug #582157 reported by Robert Collins on 2010-05-18
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Bazaar
High
Andrew Bennetts

Bug Description

I did a pull of bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/ with local revno 10721, up to revno 10876.

This averaged 21KB/s:
748.335 Transferred: 15944KiB (21.3K/s r:11110K w:4834K)

This suggests a problem somewhere - we can generate our data much more quickly than that. I'll try to get a strace of the worker process and some network stats tomorrow.

----

Dear anyone experiencing this bug:

It will help us reproduce it if you:

- run the affected pull command with -Dhpss -Dhpssdetail in the debug flags, then attach the trace from ~/.bzr.log to this file
- perhaps also do that under strace and attach the output

tags: added: network performance udd

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Public bug reported:
>
> I did a pull of bzr+ssh://bazaar.launchpad.net/~launchpad-
> pqm/launchpad/devel/ with local revno 10721, up to revno 10876.
>
> This averaged 21KB/s:
> 748.335 Transferred: 15944KiB (21.3K/s r:11110K w:4834K)
>
> This suggests a problem somewhere - we can generate our data much more
> quickly than that. I'll try to get a strace of the worker process and
> some network stats tomorrow.
>
> ** Affects: bzr
> Importance: Critical
> Status: Confirmed
>
>
> ** Tags: network performance udd
>
> ** Tags added: network performance udd
>

I've certainly seen 1MB/s @ Vincent's w/ low latency, while doing a
fresh branch, etc. So the source is capable of doing better.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvy6eIACgkQJdeBCYSNAANyBQCgj1EsDi9xlJy9N/ZBuztBKI7k
qogAn1q/MeUdW8W8mjkhoIjhAXWpvc0F
=Wco3
-----END PGP SIGNATURE-----

Martin Pool (mbp) on 2010-05-26
Changed in bzr:
importance: Critical → High
Martin Pool (mbp) wrote :

I'll target this for 2.2, but it would be better broken into more specific bugs.

Martin Pool (mbp) on 2010-06-07
Changed in bzr:
milestone: none → 2.2.0
Martin Pool (mbp) wrote :

I tried this with my branch of launchpad's devel, which is about a week out of date.

0s start
10s connection opened
14s-25s four get_parent_map calls
25s-31s send get_stream_1.19
31s-111s receiving stream
111s get_tags_bytes

total r4592kB w84kB 40kB/s

The stream returned has some small byte parts read and I wonder if they could be causing choppy behaviour on the network.

Martin Pool (mbp) wrote :

So smart/protocol.py does buffer up the stream body so that it writes only in blocks of 1MB. However there are multiple layers of buffering underneath this, and we may be getting bad interactions between them.

Changed in bzr:
status: Confirmed → In Progress
Martin Pool (mbp) wrote :

By contrast downloading from chinstrap using lftp over ssh gets about 512kB/s from the same location.

Martin Pool (mbp) wrote :

and fetching a pack file over sftp from bazaar.launchpad.net (again lftp on the client and conch on the server) is about 80-100kB/s, under what should be similar network conditions.

It's plausible that allowing for a few delays on the server side and at startup, the bzr transfer is getting about 80kB/s when busy and then 50kB/s overall.

There's a complication here which is that the log file I originally tried downloading from chinstrap will be very compressible by ssh, so it's not really comparable. I'll use the same pack file. I can upload that to chinstrap at about 70kB/s, and bring it down at about 85kB/s.

So there's some measurement noise here but we're not clearly much slower than using sftp to another machine in the datacentre as far as bulk transfers.

Martin Pool (mbp) on 2010-06-07
Changed in bzr:
assignee: nobody → Andrew Bennetts (spiv)
Martin Pool (mbp) on 2010-06-07
description: updated
Gordon Tyler (doxxx) wrote :

I don't know if you still need logs but I tried branching lp:bzr into a fresh standalone branch with the -Dhpss and -Dhpssdetail options using bzr 2.1.1 and bzr.dev r5276. They each took about 10 minutes on a 12Mbit cable internet connection and a decently powered client machine.

I'm attaching the logs: bzr-2.1.1.log and bzrdev-r5276.log.

Gordon Tyler (doxxx) wrote :

And here's the log for bzrdev-r5276.

Ian Clatworthy (ian-clatworthy) wrote :

Log file attached for branching lp trunk into an empty shared repo using bzr 2.1.1.

Martin Pool (mbp) wrote :

We haven't reproduced this slow behavior, but perhaps the situation is not exactly the same. I have a few ideas that just may help and after that, if we have no more data, we'll throw it back to Incomplete.

Here are the specific ideas:

 * remove undue buffering of outgoing hpss data <https://bugs.edge.launchpad.net/bzr/+bug/590638>

 * make it possible to see the server debug log on the client <https://bugs.launchpad.net/bzr/+bug/590633>

 * use a socketpair to talk to the ssh client process <https://bugs.launchpad.net/bzr/+bug/590637>

 * run tcptrace <http://www.tcptrace.org/> to see if there is bad tcp-level behaviour

 * log server cpu time per request

 * print more stats at the end of the command including cpu seconds, total rpcs, and number of large/small network reads

Martin Pool (mbp) wrote :

Thanks Gordon, that does help. It's interesting this shows us finishing up with vfs reads of the iix file: there may be another bug for that.

At the moment the most likely thing looks like bug 590638.

Gordon Tyler (doxxx) wrote :

I have a an old Apple Cube (G3 PowerPC) running Debian so I did some tests branching from that to my Windows desktop PC with the -Dhpss -Dhpssdetails options set and then capture the .bzr.log from the server.

First, I did it using bzr-2.1.1 on the Windows desktop and bzr-2.0.1 on the Debian server as it exists in Debian stable.

Gordon Tyler (doxxx) wrote :

Then I did it using bzr.dev on both Windows desktop (compiled using VC2008) and the Debian server (compiled using gcc 4.3.2), r5276 and r5284, respectively.

Hopes this helps. The Apple Cube is quite underpowered, it seemed to spend a lot of time at 80-90% CPU (just eyeballing it using top in another ssh window).

Martin Pool (mbp) wrote :

Robert and I studied this a bit today, by pulling from chinstrap which is a machine on the network pretty close to Launchpad. Amongst other things we watched the socket buffer size on both sides, and the strace of read/write system calls on both sides. We're using external SSH on both sides so the socket is not directly controlled by bzr.

What we observed is that the server's send buffer is often filled with quite large chunks (1MB) which gradually drain out, then there's a period where the server is not writing anything out. On the client, the recv buffer is almost always empty: SSH and bzr are accepting data as fast as it comes in.

This was in the context of the proposed merge for bug 590638. I didn't see conclusive proof that patch will make things faster, but I'm more convinced that it will help in some cases where there are small parts in the stream and where the sender is choppy. Without this patch, we leave some data in the application buffer so when we do start writing, the OS has a bit more data to send that could have been sent earlier.

There seem to be two independent problems then:
 0- There are multi-second periods where the bzr server process is cpu or disk bound and not writing out anything. Because these pauses are relatively long and happen near the start of the connection there's not enough data already buffered to keep the network full, so it goes idle.

 1- When the server process has provided a lot of data, the server sends it relatively slowly. The server OS has >1MB of data, the client advises it has a receive window over 500kB, and yet the server sends at only somewhere around 300kB/s.

We see a similar rate fetching a large random file over ssh or https. If we try multiple simultaneous connections the measurement is noisy but it seems like they each reach about this rate: at any rate the same bandwidth is not being divided between them. So this does look like the server is limiting the rate at which it transmits.

Martin Pool (mbp) wrote :

With a 500kB window size and 320ms rtt Sydney-London, we should be able to get up to about 1.5MB/s throughput, and clearly we're falling substantially short of that.

* Is internal SSH windowing the limiting factor?
* Should the client OS be advertising a larger window size, and if so why isn't it?
* Is the server backing off due to either congestion control or flow control?

Martin Pool (mbp) wrote :

Sending from my machine to chinstrap with just dd piped into ssh also runs at about 250kB/s, and it looks like the flow control window is not the limiting factor, but it needs more examination.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
> We see a similar rate fetching a large random file over ssh or https.
> If we try multiple simultaneous connections the measurement is noisy but
> it seems like they each reach about this rate: at any rate the same
> bandwidth is not being divided between them. So this does look like the
> server is limiting the rate at which it transmits.

To confirm this, I ran:

 wget http://bazaar.launchpad.net/...

vs

 wget $URL & wget $URL

vs

 wget $URL & wget $URL & wget $URL

(basically, changing from 1 connection downloading, up to 3 concurrent
downloads of the same content.)

And I definitely see the difference in the connection. (I captured some
graphs.)

I definitely get more thorough utilization with multiple streams.

What is really strange, though, is that if I go to Vincent's machine, I
get:

  2010-06-25 15:45:56 (1.50 MB/s) ...

On the exact same file.

Is it possible that someone outside of Launchpad is doing the bandwidth
capping?

(Note that my max bandwidth is about 300kB/s, so I won't see anything
above what Martin posted as a single-connection limit.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwks0IACgkQJdeBCYSNAAP3LACgk1qOdp8XsRQhR9L3iyDGRN3y
eYAAoMBP5KVjwN8pg6sqGiCZEsezWkqX
=jf1c
-----END PGP SIGNATURE-----

Martin Pool (mbp) wrote :

@John the most obvious difference between your machine and Vincent's is the different round trip time, which connects to the TCP windowing restriction discussed above. What ping time do you see to chinstrap from those two machines?

Generally speaking when people talk about pull performance we should ask them for the rtt too. (It may be semi obvious from looking at the delay on the hello rpc.)

If the throughput is linear with rtt that would be useful data.

To sum up so far, I think we can split the problem into two parts:

1- bzr is spending too much time thinking (therefore leaving the network idle), and is sending too much data. We can reproduce and fix this entirely locally, either with a slow network simulation or even just a fast local network and looking at the cpu usage and the bytes transmitted. Any improvement in them will help.

2- Bulk TCP transfers from Launchpad do not use the full bandwidth of the the client's local link. This seems independent of the first, because it's also slow fetching over SSH from a DC server. We can look at other downloads from the DC (from PPAs, maybe from cdimage) to try to work out if it is an SSH limit, if it's TCP windowing or congestion, if it's traffic management somewhere in between. It may be possible to tweak tcp settings on either the client or the server to get better results.

I'm moderately confident that there is not too much middle ground or interaction between them: bzr sees about the same performance as plain SSH so we're probably not making things worse, and during the bulk transfer phase we do seem to keep the send buffers full and the receive buffers empty.

Let's keep this bug open for the overall problem, link to bugs in bzr for specific aspects of #1, and file a bug against lp itself for #2.

Martin Pool (mbp) wrote :

Actually because there's a lot of data here already, let's keep this for the network-level issues until we identify something more specific.

Fetching an image over http from cdimage.ubuntu.com gets up to about 1.8MB/s which is close to my local link's speed. Wireshark shows the rcv window size going up to about 2.2MB.

Interestingly with wireshark showing a live capture the speed fell back to about 650kB/s - there is some noise but it might actually be interfering with the speed of the transfer. At any rate this does seem to show that transfers between here and the London DC can go substantially faster than 300kB/s.

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> @John the most obvious difference between your machine and Vincent's is
> the different round trip time, which connects to the TCP windowing
> restriction discussed above. What ping time do you see to chinstrap
> from those two machines?
>

well, and his peak bandwidth :).

My rtt is about 150ms according to ping -c 100. I'm sure it varies based
on time of day.

I just tested it on Vincent's machine, and got a startling 120ms time.
Extra surprising because I could swear that Vincent had 20ms time. Which
makes me wonder if vila was playing with 'tc' and left a 100ms slow jump
in his network trace.

It is reproducible, though. And I still get 1.5MB to Vincent's machine,
and have difficulty hitting 300kB/s on mine.

The URL I'm testing is:

http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/.bzr/repository/packs/a1bd96ce1421fe7d0b8d357b708a73c7.pack

Which is the 30MB pack file in the bzr repository. (The only way for me
to get close to maxing out the connection is to do multiple concurrent
downloads.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwpD3sACgkQJdeBCYSNAAOuPgCeNgmBRGwYtz3iA4uW71SbFpTP
VbEAoI4amR66jHbkeiZmDpH9Ygn5itSV
=FmpV
-----END PGP SIGNATURE-----

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> @John the most obvious difference between your machine and Vincent's is
> the different round trip time, which connects to the TCP windowing
> restriction discussed above. What ping time do you see to chinstrap
> from those two machines?

It seems to depend on time of day. Vincent tested after seeing my
original mail, and said he, indeed, saw 20ms. We just tested again right
now (17:00 UTC) and we both see 120ms to chinstrap or bazaar.lp.net.

Actually, at this instant, my local connection is at 110ms (so actually
faster than Vincent's.)

And while it does do a little bit better job saturating the bandwidth, 2
concurrent downloads is flatlined at cap, 1 is still 'bouncy'. (I made
sure to test from my server which has a wired connection, to eliminate
any noise from my house wireless.)

So even though the ping times are comparable, I certainly see far more
download speed to Vincent than myself. (He has a bigger pipe, but you
would think we could fully saturate my pipe with a single download. I
*do* see a full saturation downloading a single tar.bz2 from kernel.org.)

IOW, it doesn't seem to be strictly linear vs ping time.

Then again, my ping time never gets as low as 20ms, so it may be
something at this time of day that de-prioritizes ICMP?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwqMEoACgkQJdeBCYSNAAMkigCgjXWVNfxPxH9IKAIU+FZ8SuqQ
G3cAn2A6J22uupuicLoopCx74xPajBgp
=ytVz
-----END PGP SIGNATURE-----

John A Meinel (jameinel) wrote :

ping on this. The bug is targetted to 2.2 and one of the potential fixes for that is present (I believe this is the socketpair change). The bug itself is a bit nebulous since there is no specific reason or fix to have landed. Should this be put back to Confirmed, or closed and a new bug filed?

Andrew Bennetts (spiv) wrote :

Bug 590637 (use socketpair to talk to ssh subprocess) and bug 590638 (hpss buffers too much) have both been fixed since this bug was filed. My guess is that 590638 is more likely to make a difference, but that's a server-side issue in this case (because the server is doing all the writing, not the client.) Launchpad is still serving using 2.1.0 (according to 'bzr ping lp:bzr', which reports the 'Software version' HPSS response header), which doesn't have that fix.

So I propose leaving this bug In Progress until Launchpad upgrades their bzr server to 2.2, and then re-evaluating where we are at. I believe that is planned to happen fairly soon now that 2.2 has been released.

(I suppose we could mark the bzr part of this bug as Incomplete, and open this bug on launchpad-code too as In Progress, if using In Progress like that bugs you too much.)

Andrew Bennetts (spiv) wrote :

> So I propose leaving this bug In Progress until Launchpad upgrades their bzr server to 2.2, and then re-evaluating where we are at.

Well, 'bzr ping lp:bzr' reports that LP is running 2.2.0, so I tried the situation in the bug description: pulling r10876 of bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/ into a (tree-less) branch of r10721. I did this from my laptop here in Sydney, so other side of the world to data centre.

Result: Transferred: 11582kB (114.0kB/s r:11397kB w:185kB)

So it's apparently much better now, so that's good! There's more going on here, though: that kB/s figure is perhaps misleadingly pessimistic. It divides the total kB transferred over the life of the process, when (arguably) a more helpful value would be over the active lifetime of the connection. i.e. from connection created to last bytes sent or received. This means the kB/s figure includes e.g. tree building time after the network chatter is (hopefully...) finished. So take care when interpreting that number.

Looking closer at the -Dhpss logging: the get_stream request was sent at 28.7s and the response was finished some time before 92.9s, and the total bytes sent in that one response was 10992877 (combined *.pack and *.?ix files on disk after the transfer are 9.7M, FWIW), so that's ~170kB/s for the stream itself. The original report was 21K/s (and nearly 13 minutes, rather than nearly 2).

I think that's probably good enough to consider this specific report closed. What do other people think?

Martin Pool (mbp) wrote :

That's certainly much better. More could be done but perhaps this
specific bug should be closed, and we can identify some others to do
next.

A few questions:

 * how close is 170kB/s to saturating your local link?
 * are there any obviously inefficient hpss calls (there probably are
too many graph calls, and that would be clearly a different bug)
 * is 11.5MB a reasonable size for the amount of actual change in that
repo from r10721 to r10876?

> Looking closer at the -Dhpss logging: the get_stream request was sent at
> 28.7s and the response was finished some time before 92.9s, and the
> total bytes sent in that one response was 10992877 (combined *.pack and
> *.?ix files on disk after the transfer are 9.7M, FWIW), so that's
> ~170kB/s for the stream itself.  The original report was 21K/s (and
> nearly 13 minutes, rather than nearly 2).

Taking 28.7s before starting to send bulk data is a bit high.

>
> I think that's probably good enough to consider this specific report
> closed.  What do other people think?

I think so. I don't like having very broad bugs open that are going
to stay open until we reach some kind of asymptotically good
performance. On the other hand it is nice to have a few clear leading
bugs to point users and developers to.

--
Martin

Andrew Bennetts (spiv) wrote :

Martin Pool wrote:
> That's certainly much better. More could be done but perhaps this
> specific bug should be closed, and we can identify some others to do
> next.
>
> A few questions:
>
> * how close is 170kB/s to saturating your local link?

Not very. IIRC in theory I ought to be able to do something like 10x
that, but when the data is coming from the other side of the world I
never expect that.

> * are there any obviously inefficient hpss calls (there probably are
> too many graph calls, and that would be clearly a different bug)

HPSS calls: 16 (1 vfs)

The sequence is, in summary:

  get(/~launchpad-pqm/launchpad/devel) -> ReadError
  BzrDir.open_2.1(~launchpad-pqm/launchpad/devel/)
  BzrDir.open_branchV3(~launchpad-pqm/launchpad/devel/)
  BzrDir.find_repositoryV3(~launchpad-pqm/launchpad/devel/)
  Branch.get_stacked_on_url(~launchpad-pqm/launchpad/devel/) -> NotStacked
  Branch.last_revision_info(~launchpad-pqm/launchpad/devel/)
  Repository.get_rev_id_for_revno(
    ~launchpad-pqm/launchpad/devel/', 10876,
    (11636, '<email address hidden>'))
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Repository.get_stream_1.19(...)
  Repository.get_parent_map(...)
  Repository.get_parent_map(...)
  Branch.get_tags_bytes(...)

The two post-get_stream get_parent_map calls look a bit funny to me,
maybe related to a missing keys check?

Considering the number of revisions involved 7 get_parent_maps doesn't
seem excessive, though.

There's in principle some improvement we could make in the number of
calls involved in opening the bzrdir/branch/repo, including the
get_stacked_on_url and last_revision_info in the open_branch result for
example, but there's a bit of friction with the bzrlib APIs in doing
that all at once. Adding (and using) something like
"open_read_locked_branch(url)" in the API might be a fairly simple way
to reduce that friction?

> * is 11.5MB a reasonable size for the amount of actual change in that
> repo from r10721 to r10876?

Good question. I'm not sure of a convenient way to measure that, but
that fetch did add 1968 new revisions, so that's something like <6kB per
revision, which sounds pretty good!

Alternatively, comparing a 'bzr pack'ed repo of before and after the
fetch the size difference of .bzr/repository is 2.7M.

So, I guess the answer here is: not bad, but about 4x larger than
optimal for our current compression... but of course achieving optimal
compression is pretty expensive. And we'll inevitably have some
overhead for discovery of where we differ from the remote repository.

[...]
> Taking 28.7s before starting to send bulk data is a bit high.

It is, although 10s of that run was waiting for the SSH connection to be
established (possibly including DNS lookups as the SSH handshake itself
— a second run only took 7.5s to establish a SSH connection), and it's
hard to do much about that.

Paul Sladen (sladen) wrote :

The setup itself is not so much a question of speed, but one of perceived response. Try the exact following command, and time out how long it is until the first status line is output:

  bzr branch lp:ubuntu-font-family-website

(hint: there is another bug at play here, but want I'm seeking to do is highlight the uncertainty for the user)

Robert Collins (lifeless) wrote :

Paul, this bug is really actually seriously about the concrete
performance. I agree that perceived performance matters too, but
please file a new bug for that, so that we keep this one at least
moderately focused.

Andrew Bennetts (spiv) wrote :

Ok, I'm going to close this, as tentatively agreed in comment 25. The way to make further progress is with fresh reports of specific problems. Paul, please do file a bug about the perceived responsiveness of that command, for example.

Changed in bzr:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers