no way to disable autopacking

Bug #494012 reported by Ernst
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

I'm running bzr 2.0.2 on Ubuntu 9.10. My server uses sftp and thus is 'dumb'. The size of my repository is 222M (according to du -hs .bzr) and it has 207 revisions (bzr revno). It uses database format 2a.

Yesterday, I committed some trivial changes (two files: one text file of a couple of KB, one PDF of 1.8 MB). However, this commit took a long time and transferred about 70 MB. This is quite a lot; luckily I was on my university were the upload bandwidth is quite high, bit still, it did take much longer than normal. So two problems:
- Unexpectedly, a lot of data is transferred
- Unexpectedly, the commit takes a lot of time

This is because bzr started to pack the server repository. I think this should not be invoked without warning and user confirmation; if, for example, you have a flacky 3G connection (with data limit) or are in a hurry, this behavior is certainly not desired.
For example, I do commit always at the end of the day just before logging of; then, such delay is not desirable.

Hopefully, this behavior can be improved.

The corresponding log entries in .bzr.log:
Mon 2009-12-07 16:52:24 +0100
0.531 bzr arguments: [u'commit', u'-m', u'Work on doc: experiments']
0.685 looking for plugins in /home/user/.bazaar/plugins
0.686 looking for plugins in /usr/lib/python2.6/dist-packages/bzrlib/plugins
1.299 encoding stdout as sys.stdout encoding 'UTF-8'
2.255 opening working tree '/mnt/Documenten/Documenten'
3.497 ssh implementation is OpenSSH
5.145 preparing to commit
[14464] 2009-12-07 16:52:30.232 INFO: Committing to: sftp://<email address hidden>/home/user/bzr/trunk/
5.801 Selecting files for commit with filter None
[14464] 2009-12-07 16:52:40.801 INFO: modified Studie/VU/Afstuderen/doc/Report/experiments.tex
[14464] 2009-12-07 16:52:40.834 INFO: modified Studie/VU/Afstuderen/doc/Report/report.pdf
20.302 Using fetch logic to copy between CHKInventoryRepository('file:///mnt/Documenten/Documenten/.bzr/repository/')(<RepositoryFormat2a>) and CHKInventoryRepository('sftp://<email address hidden>/home/user/bzr/trunk/.bzr/repository/')(<RepositoryFormat2a>)
20.302 fetch up to rev {<email address hidden>}
36.326 Auto-packing repository <bzrlib.repofmt.groupcompress_repo.GCRepositoryPackCollection object at 0x910fb2c>, which has 13 pack files, containing 220 revisions. Packing 10 files into 1 affecting 10 revisions
36.429 repacking 10 revisions
37.108 repacking 10 inventories
37.820 repacking chk: 10 id_to_entry roots, 9 p_id_map roots, 326 total keys
39.836 repacking 307 texts
281.422 repacking 0 signatures
282.682 Auto-packing repository <bzrlib.repofmt.groupcompress_repo.GCRepositoryPackCollection object at 0x910fb2c> completed
[14464] 2009-12-07 16:57:07.194 INFO: Committed revision 207.
282.842 return code 0

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 494012] [NEW] bzr needs excessive amount of bandwidth for commiting (2a)
Download full text (3.3 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ernst wrote:
> Public bug reported:
>
> I'm running bzr 2.0.2 on Ubuntu 9.10. My server uses sftp and thus is
> 'dumb'. The size of my repository is 222M (according to du -hs .bzr) and
> it has 207 revisions (bzr revno). It uses database format 2a.
>
> Yesterday, I committed some trivial changes (two files: one text file of a couple of KB, one PDF of 1.8 MB). However, this commit took a long time and transferred about 70 MB. This is quite a lot; luckily I was on my university were the upload bandwidth is quite high, bit still, it did take much longer than normal. So two problems:
> - Unexpectedly, a lot of data is transferred
> - Unexpectedly, the commit takes a lot of time
>
> This is because bzr started to pack the server repository. I think this should not be invoked without warning and user confirmation; if, for example, you have a flacky 3G connection (with data limit) or are in a hurry, this behavior is certainly not desired.
> For example, I do commit always at the end of the day just before logging of; then, such delay is not desirable.
>
> Hopefully, this behavior can be improved.
>
> The corresponding log entries in .bzr.log:
...

> 282.682 Auto-packing repository <bzrlib.repofmt.groupcompress_repo.GCRepositoryPackCollection object at 0x910fb2c> completed

You are auto-packing over a 'dumb' transport, which means we have to
download and re-upload the content. You could

1) Run the smart server, where autopacking is done server side.
2) Interrupting the auto-pack is 'safe'. I'm not sure if the branch will
be updated, and unlocked, but doing 'bzr push... ^C; bzr break-lock; bzr
push" should make sure everything is uploaded, and the branch history is
correct. Note that the *next* push will also try to autopack until it
succeeds.

I guess if you are doing it with a bound branch and 'commit', it is a
bit harder to trigger at the right times.

3) I'm pretty sure there is already an open bug about having autopack be
something you can disable, but it certainly is something we want to do
automatically so that people don't have to worry about it in normal
operation.

4) I'll also note that once a large auto-pack is done, it will be an
order of magnitude until we do it again. (eg, if you commit from
scratch, we will repack everything at 10 commits, 100 commits, 1000
commits, 10,000 commits, etc.)

So the likelyhood of you encountering this again quickly is quite low.
You also can manually run "bzr pack sftp://" at an opportune time, to
reduce the chance that this will happen automatically.

5) If you are interested in (3), I think a config option for
"disable_autopack=True" as a
"Branch.get_config().get_user_option('autopack')" would be reasonable.
(Possibly as a repository-level config?). Then you could set all your
sftp locations to not autopack automatically.

6) I'll also note that sftp performance is often fairly poor, just
because of sftp limitations. (reading a file requires an OPEN call, a
READ call, and then a CLOSE call. So it is 3 round trips to get all
content. We do try to prefetch, etc when we can.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Co...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

I couldn't find a bug for autopacking, so I'm co-opting this one.

summary: - bzr needs excessive amount of bandwidth for commiting (2a)
+ no way to disable autopacking
Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Ernst (ernst-blaauw) wrote :

Thanks for your help. Sadly, a smart server is not an option as I just have cheap web hosting which just serves ssh+sftp. A bzr daemon running on the server is not an option.

A way to make the process smarter, is to use packed local data, as my local repository is full checkout of the server. So, the data is already available locally. (As download is magnitudes faster than upload, this does only improve the total duration by 10%, I think, but still it is useful.)

I read somewhere bzr is more optimized for round trips than for data bandwidth; it assumes a lot of bandwidth is available, and thus latency is the limiting factor (I don't know if this is true and if it still holds for 2a). If this is still the case, is it an idea to have an option to optimize the data structure for bandwidth, instead of latency for people using 3G for example (as those connections here in The Netherlands often have high speeds and low latency, but do have a data limit)?

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 494012] Re: no way to disable autopacking

On Fri, 2009-12-11 at 00:10 +0000, Ernst wrote:
>
> I read somewhere bzr is more optimized for round trips than for data
> bandwidth; it assumes a lot of bandwidth is available, and thus
> latency
> is the limiting factor (I don't know if this is true and if it still
> holds for 2a).

It's never been the case. bzr tries to use a little data and as few
round trips as possible, but there is a relationship between them.

packing occasionally is necessary to prevent a data * latency multiplier
effect occuring over time; it doesn't happen on every commit, and if
you're working with 3G, turning it off would be a very bad idea.

In short:
 - use a smart server (will pack for you)
 - or performance will get linearly worse, at approximately 5 * your RTT
per commit.

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote :

Ernst wrote:
> Thanks for your help. Sadly, a smart server is not an option as I just
> have cheap web hosting which just serves ssh+sftp. A bzr daemon running
> on the server is not an option.

Note that the smart server doesn't require a daemon. If you can get bzr
installed on the remote host then bzr+ssh should Just Work, as it simply
executes 'bzr serve --inet ...' on the remote host. I realise even that
might not be possible with cheap web hosting, but I'm clarifying just
in case it helps.

Revision history for this message
Eric Siegerman (eric97) wrote :

Rather than a boolean "disable autopacking completely" flag, how about a configurable limit, something along the lines of: "don't automatically create a pack larger than X". This limit should apply only to autopacking, not to explicit "bzr pack".

This would let one choose, for one's particular circumstances, an appropriate tradeoff between the maximum acceptable hit at commit time, and the (5 * RTT * #packs) hit on every interaction. One could then manage the latter by doing explicit "bzr pack"s at times of one's own choosing.

Note that the limit should be used only as an a-priori estimate when deciding whether to crunch a given set of packs; if, after the fact, the new pack ends up exceeding the limit, *don't* throw it away on that account -- the work has already been done. Obviously, if a single commit exceeds the limit, that's OK; the resulting pack will simply never be autopacked, but will live on its own until the next explicit "bzr pack".

Jelmer Vernooij (jelmer)
tags: added: autopack packs
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.