"bzr push" uploads 1-2 MB just to send a one-line change revision

Bug #252945 reported by Dan Watkins
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Undecided
Unassigned

Bug Description

Hello. So I did this on my local machine:
bzr branch bzr+ssh://some_mysql.com/bzrroot/server/TRASH/mysql-maria-old
Branched 2641 revision(s).
cd mysql-maria-old/
xemacs README
bzr diff
=== modified file 'README'
--- README 2008-07-11 15:06:54 +0000
+++ README 2008-07-11 15:18:02 +0000
@@ -1,4 +1,4 @@
-test for bzr 2
+test for bzr 3
 This is a release of MySQL, a dual-license SQL database server.
 MySQL is brought to you by the MySQL team at MySQL AB.

bzr commit --message=please_ignore
Committing to: /home/mysql_src/bzrrepos/mysql-maria-old/
modified README
Committed revision 2642.
bzr push bzr+ssh://some_mysql.com/bzrroot/server/TRASH/mysql-maria-old/
Pushed up to revision 2642.

Before doing the "bzr push" above, I made sure to shut down all non-bzr network activity on my machine (IRC, mail, etc).
I could see, both with tcpdump (where I can grep for packets sent to some_mysql.com, which are 100%-sure due to the "bzr push"), and with iptraf (which tells the overall traffic over eth0), that "bzr push" consumed 749 KBytes over the network (700 KB incoming + 49KB outgoing).

Then I did another change:

=== modified file 'README'
--- README 2008-07-11 15:18:32 +0000
+++ README 2008-07-11 15:25:51 +0000
@@ -1,4 +1,4 @@
-test for bzr 3
+test for bzr 4

and commit and push, and this push transferred 2MB this time (80 KB outgoing).

That sounds strange. This was a major problem for a colleague on a very slow line who just wanted to push a small patch.

Tags: hpss push
Revision history for this message
GuilhemBichot (guilhem-bichot) wrote :

It does sound strange to have 1 or 2MB transferred over the network to push a one-byte diff.
Shouldn't the system be able to just send the diff (as well as a bit of meta-data: date, committed, whatever), that is, 1kB or so?

Jelmer Vernooij (jelmer)
tags: added: hpss
tags: added: push
Revision history for this message
Robert Collins (lifeless) wrote :

Please retest with bzr 1.16 on both server and client.

Changed in bzr:
status: New → Incomplete
Revision history for this message
GuilhemBichot (guilhem-bichot) wrote :

I retested today.
Server is
Bazaar (bzr) 1.17dev
  Python interpreter: /usr/bin/python 2.4.4
  Python standard library: /usr/lib/python2.4
  bzrlib: /usr/lib/python2.4/site-packages/bzrlib
and repo format there is: "Shared repository (format: 1.14 or 1.9)"
Client is
Bazaar (bzr) 1.17dev
  from bzr checkout /home/mysql_src/logiciels/bzr_versions/dev
    revision: 4439
    revid: <email address hidden>
    branch nick: dev
  Python interpreter: /usr/bin/python 2.5.2
and repo format there is "Shared repository with trees (format: 1.14 or 1.9)."
I'm logged into my client machine; I have switched off VPN, email, IRC.
I do "bzr branch" to copy a branch located on the server onto my client machine.
Then I do a one-line change to a text file which is 1900 bytes long, and I commit and push:
it generates 98 kilobytes (KB) incoming and 17 KB outgoing (read from the output of "iptraf" and "tcpdump").
I repeat with a second similar change to another file (which is itself 215 bytes long), and I see again 98KB and 17KB.
So it's not 2MB anymore, but it's still 115KB for a one-line diff...

Revision history for this message
John A Meinel (jameinel) wrote :

If you run "bzr push -Dhpss" and attach the output in ~/.bzr.log it will show what requests are being made, and how many bytes of content. I don't think it measures the number of bytes on the wire...

As for disabling everything... I think you can just tell tcpdump to filter out everything that isn't SSH between the appropriate hosts.

Note that you should also just do "ssh host /bin/false" to give a baseline of how much traffic it takes just to establish the SSH connection.

Revision history for this message
John A Meinel (jameinel) wrote :

You could also include the attached diff. I'm pretty sure it will record the same bytes multiple times as various layers process them, but at least it will dump the info about what *we* think we are transmitting and receiving, and we can probably filter it after the fact to avoid the duplication.

(I'm hoping it will be obvious when we see 2 classes always follow eachother with the same number of bytes processed.)

Revision history for this message
John A Meinel (jameinel) wrote :

Attached is a better (aka *working*) version of the patch.
I avoid at least one duplicate. I can't really tell whether there is any overlap or not. From what I can see, there is not.

Revision history for this message
John A Meinel (jameinel) wrote :

As an excerpt of the debug log, I see stuff like:
2.477 hpss call: 'get', '/srv/bzr/public/branches/bzr/jam-integration/.bzr/branch-format'
2.477 (to bzr+ssh://juju.arbash-meinel.com/srv/bzr/public/branches/bzr/jam-integration/)
2.477 medium request SmartClientStreamMediumRequest accepted 137 bytes
2.486 decoder ProtocolThreeDecoder accepted 28 bytes
2.486 decoder ProtocolThreeDecoder accepted 30 bytes
2.486 decoder ProtocolThreeDecoder accepted 1 bytes
2.486 decoder ProtocolThreeDecoder accepted 1 bytes
2.486 decoder ProtocolThreeDecoder accepted 1 bytes
2.486 decoder ProtocolThreeDecoder accepted 4 bytes
2.487 decoder ProtocolThreeDecoder accepted 6 bytes
2.487 result: ('ok',)
2.487 decoder ProtocolThreeDecoder accepted 1 bytes
2.487 decoder ProtocolThreeDecoder accepted 4 bytes
2.487 decoder ProtocolThreeDecoder accepted 35 bytes
2.487 decoder ProtocolThreeDecoder accepted 1 bytes
2.487 35 body bytes read

Which I think shows that we send a 137byte request (upload) to read the branch-format file, and then receive 28+30+1+1+1+4+6 bytes to get the protocol wrapper, and then 1+4+1 bytes of more header/tail info, and then 35 bytes of actual content.

I'm not 100% sure, but I think that works out correctly. So I *think* you could sum up all of the "MediumRequest" lines to get the number of bytes uploaded, and then sum up the "ProtcolThreeDecoder" lines to get the number of bytes downloaded.

Note that my test was for a simple push that didn't change anything, so there may be more bits that I don't know about.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 252945] Re: "bzr push" uploads 1-2 MB just to send a one-line change revision

2009/6/20 John A Meinel <email address hidden>:

> As for disabling everything... I think you can just tell tcpdump to
> filter out everything that isn't SSH between the appropriate hosts.

You want something like this:

  sudo tshark -w /tmp/capture -p -i wlan0 -f 'ip host 10.113.3.27 and
tcp port 22'

or

  sudo tcpdump -p -w /tmp/capture -i wlan0 'ip host 10.113.3.27 and tcp port 22'

Revision history for this message
GuilhemBichot (guilhem-bichot) wrote :
Download full text (3.9 KiB)

Hi folks. Uh, thanks for all the comments, custom debugging branch... however this requires a bit too much from my available time these days; what I can tell is:
- opening a ssh session and closing it takes 10kB (both with ssh-ing into my own machine (loopback) or with ssh-ing to a real remote host)
- I have reproduced the ~100kB figure like this:
 * I have a shared repo "repo1" of MySQL branches on my machine
 * I create a new shared repo "repo2" on my machine
 * cd repo2
 * bzr branch bzr+ssh://localhost/repo1/mysql-6.0
 * commit one-line change, push
After eliminating the overhead of opening and closing the ssh connection, remains ~100kB of transfer (actually 200 according to iptraf, but divided by two as it's a single machine).
"bzr push" ran with -Dhpss and this is of interest in .bzr.log:
4.277 hpss call w/body: 'Repository.get_parent_map', 'm/bzrrepos/', 'include-missing:', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>...

Read more...

Revision history for this message
Andrew Bennetts (spiv) wrote : Re: [Bug 252945] Re: "bzr push" uploads 1-2 MB just to send a one-line change revision

GuilhemBichot wrote:
[...]
> After eliminating the overhead of opening and closing the ssh connection, remains ~100kB of transfer (actually 200 according to iptraf, but divided by two as it's a single machine).
> "bzr push" ran with -Dhpss and this is of interest in .bzr.log:
> 4.277 hpss call w/body: 'Repository.get_parent_map', 'm/bzrrepos/', 'include-missing:', [50+ revisions elided]
> 4.277 3 bytes
> 4.552 result: ('ok',)
> 4.554 58819 body bytes read
>
> see, 59kB already. So maybe it's just sending the whole list of revision ids for each push...?
> I'm sorry, I'm not going to debug it further, the steps above should make it straightforward to reproduce on your side.
> Note that it's not a deadly problem anymore (100kB is better than 1MB).

This is by design, although we can look at tweaking the values.

The basic tradeoff here is roundtrips vs. bandwidth. Our main interest has
been making bzr operate faster on the network, rather than using the minimum
amount of traffic. Sometimes that means transferring too much data to avoid
extra roundtrips later; roughly speaking we tend to treat sending up to 64k
of data as being essentially as fast as sending 1k, whereas each round trip
can easily be hundreds of milliseconds.

In this case, you only have a single new revision, but bzr doesn't know that
initially. Instead it has to query the remote server to find out which of
our local revisions are present on the remote side. In the interests of
cutting down round trips the client queries the server for at least 50
revisions at a time, rather than just 1 at a time. Similarly, the response
from the server (which is gzipped) will contain results for the requested
revisions *plus* results from adjacent parts of the revision graph until the
compressed data is estimated to be roughly 64k (the estimate is very rough
and pessimistic, so in this case it's actually 58k, but that's not a big
deal).

The thing to realise here is that if you had 2, or 10, or probably even 40
new revisions (depending on precise graph shape) it would still be a single
request/response pair. And if you are uploading 100s of new revisions this
batching process is much much better than the strictly minimal one-at-a-time
approach.

We do have some plans to get even more bang-for-buck out of these requests
(querying for a sparser set of revisions rather than just adjacent
revisions, perhaps gzipping the get_parent_map request as well as the
response), but I wouldn't expect this to make a dramatic difference to the
bandwidth cost of the initial request/response pair.

Revision history for this message
Robert Collins (lifeless) wrote :

Marking as fixed.

Changed in bzr:
status: Incomplete → Fix Released
Revision history for this message
GuilhemBichot (guilhem-bichot) wrote :

Just a final comment (I agree with closing the bug). The bug was initially reported for someone who had a low bandwidth because operating under a GPRS connection in an isolated place : 5 kbits/s i.e. 0.6 kbytes/s. In that case, 64kbytes takes 100 seconds which isn't the same as 1kbyte (2 seconds). 100 seconds is more or less fine; but when the amount was 1MB as when the bug was reported, it meant 1000 seconds which was a bit too much for such a small patch.

Revision history for this message
gavenkoa (gavenkoa) wrote :

I also issue this.

When pull done on emacs source it gets 4 MiB inet traffic! When I get diff and compress it I got only 10 KiB!

I use UMTS/WCDMA link to Inet with 20-30 KiB/s speed and I hate bazaar!

Previous to pull changes have update of binary file (.pdf docs), so getting all last 50 changes too complicate.

# time bzr pull
 M doc/emacs/ChangeLog
 M doc/emacs/display.texi
 M doc/emacs/frames.texi
 M doc/lispref/ChangeLog
 M doc/lispref/numbers.texi
 M doc/lispref/objects.texi
 M doc/misc/ChangeLog
 M doc/misc/faq.texi
 M etc/NEWS
 M lisp/ChangeLog
 M lisp/macros.el
 M lisp/subr.el
 M src/ChangeLog
 M src/keymap.c
All changes applied successfully.
Using saved parent location: http://bzr.savannah.gnu.org/r/emacs/trunk/
Now on revision 99612.

real 3m27.109s <-------------- wait for 3 min to get 10 KiB of data! You bastard!
user 0m0.015s
sys 0m0.015s
# bzr diff -r 99600..99612 >.diff
# gzip .diff
# ls -l .diff*
-rw-r--r-- 1 sasha Администраторы 9636 Mar 3 18:27 .diff.gz

gavenkoa (gavenkoa)
Changed in bzr:
status: Fix Released → Incomplete
Revision history for this message
John A Meinel (jameinel) wrote :

If emacs supported bzr+ssh rather than only http and sftp, you would see significantly better results.

Changed in bzr:
status: Incomplete → Fix Released
Revision history for this message
Karl Fogel (kfogel) wrote :

There is a ticket at savannah.gnu.org for this:

  https://savannah.gnu.org/support/?107077

(And also https://savannah.gnu.org/support/index.php?107143, but it's mainly a pointer to the former ticket.)

If someone wants to volunteer to become a Savannah admin and help out with this, this is probably the place to start (courtesy of Sylvain Beucler):

  http://savannah.gnu.org/maintenance/

Sylvain adds:

  "notably, http://savannah.gnu.org/maintenance/HowToBecomeASavannahHacker
(slightly biaised at Project Approvals, but describes team work) and http://savannah.gnu.org/maintenance/SavannahArchitecture)"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.