Asynchronous upload not working properly

Bug #387102 reported by Robin Munn on 2009-06-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Duplicity
Medium
Peter Schuller

Bug Description

Duplicity version 0.5.09
Python version 2.6.2
OS: Ubuntu Linux 9.04 (Jaunty)
Filesystem being backed up: ext3
Repeatable: Yes, on my machine (see bottom of bug report for notes)
Log output: Didn't capture any when I did the backup that caused this bug report; will run a new backup and attach -v9 logs later on.

Bug description:

The --asynchronous-upload option isn't working the way I think it should on my system, where the limiting factor is bandwidth. (I.e., it takes a lot longer to upload a 50 megabyte file through my ADSL connection than it does to prepare the next file to be uploaded).

Intended behavior (I assume):

* Volume 1 is prepared.
* Volume 1 finishes preparing, starts uploading.
* While volume 1 is uploaded, volume 2 is prepared.
* Volume 2 finishes preparing; volume 1 is still uploading, so Duplicity waits for the upload to complete.
* Volume 1 finishes uploading; volume 2 immediately starts uploading.
* While volume 2 is uploaded, volume 3 is prepared.

Etc., etc., repeating the last three steps until all volumes are complete. This means that the upload bandwidth is being used almost
constantly; by the time one volume has finished uploading, another volume is ready and waiting in /tmp.

Actual behavior:

I just launched a duplicity backup using the following command (with a real username and server name, of course):

duplicity /home/username/projects scp://<email address hidden>/backups/projects/ --asynchronous-upload --verbosity 4 --volsize 50

I then opened a WinSCP view to watch the backup upload (since I don't know which verbosity level would give me an "uploaded ### out of ### bytes" display), while doing an "ls -l" of /tmp/duplicity-xyzzyx-tempdir/ so I could watch the files being created. And I noticed the pattern was:

* Volume 1 is prepared.
* Volume 1 finishes preparing, starts uploading.
* While volume 1 is uploaded, volume 2 is prepared.
* Volume 2 finishes preparing; volume 1 is still uploading, so Duplicity waits for the upload to complete.
* Volume 1 finishes uploading; volume 2 immediately starts uploading.
* While volume 2 is uploaded, NOTHING is prepared.
* Volume 2 finishes uploading; now volume 3 is prepared. (Now NOTHING is being uploaded while volume 3 is prepared).
* Volume 3 finishes preparing, starts uploading.
* While volume 3 is uploaded, volume 4 is prepared.
* Volume 4 finishes preparing; volume 3 is still uploading, so Duplicity waits for the upload to complete.
* Volume 3 finishes uploading; volume 4 immediately starts uploading.
* While volume 4 is uploaded, NOTHING is prepared.
* Volume 4 finishes uploading; now volume 5 is prepared. (Now NOTHING is being uploaded while volume 5 is prepared).

Etc., etc., etc., until all volumes are complete.

Notes:

This is not a major bug, but it does mean that the upload bandwidth isn't being used as efficiently as it should be, which is what --asynchronous-upload was meant for. As it currently stands, all that --asynchronous-upload does is that uploads happen in "batches" of 2 volumes rather than "batches" of 1 volume, but uploads still have to wait while the next "batch" is prepared. (Or at least, until the first volume of the next "batch" is prepared).

I'm sure this is repeatable, though the problem won't show up if your upload bandwidth is faster than the preparation of the next volume. If you run this test on an internal gigabyte network, where preparing the next volume is the bottleneck rather than bandwidth being the bottleneck, you probably won't notice any difference. But run a test using an external server on a not-very-fast connection, or throttle your bandwidth somehow so that your upload bandwidth becomes the bottleneck, and you should observe the same pattern I did: volumes being uploaded in "batches" of 2, with a gap in between for the next batch to be prepared.

Related branches

Peter Schuller (scode) wrote :

I'll look into this.

Changed in duplicity:
assignee: nobody → Peter Schuller (scode)
Peter Schuller (scode) wrote :

This should now be fixed in lp:~scode/duplicity/bug-387102 (and the code should be less likely to be buggy and more easy to understand).

There is one thing missing though and that is gettext related follow-up as a result of removing the use of a couple of debug statements, and adding one or two others. I'm not sure off the top of my head what I need to do about that (I'm not that into gettext), so I'll defer that until I've time to figure it out (if someone has a quick pointer it would be appreciated - I'll add something to a README to clarify for others not wanting to go hunt in upstream gettext docs).

Michael Terry (mterry) wrote :

Basically, for normal strings that you want translated, surround the string in _() -- that is, a function that's called 'underscore'.

Surround the constant part of the string -- that is, if you have:
"backing up %s" % filename

Make it:
_("backing up %s") % filename

No import is necessary for this, it's globally available.

If you reference a number of things in a variable, like "backing up %d files", you need to use ngettext, which takes the form (ENGLISH SINGULAR STRING, ENGLISH PLURAL STRING, VARIABLE). So in our example:
gettext.ngettext("backing up %d file", "backing up %d files", num_files) % num_files

You need to import the gettext module for this. There should be plenty of examples of both formats in the code.

Peter Schuller (scode) wrote :

Thank you for your help. Unfortunately I was not being as clear as I should have; I have, as far as I can tell, made correct changes with respect to the invocation of _(). However what I was not of is what, if anything, I have to do in relation to fallout from this change. Will everything just magically work during building for various languages, or does one need to e.g. remove un-used translations, etc. I have managed to avoid having to do any build-time gettext stuff, and IIRC one of the dependencies for gettext (I think there was some extension to the python gettext package that was used?) was not easily available in FreeBSD.

In any case, it the details of that don't matter. If someone feels that the changes are enough and stand alone, for the purpose fo gettext, then the branch currently constitutes my suggested fix. If something is missing with respect to .po file generation or similar though, that's something I haven't looked at.

Michael Terry (mterry) wrote :

Ah. No, nothing more you need to do. It's all magically handled by gettext and the intltool scripts used to generate the pot files.

Peter Schuller (scode) on 2009-06-17
Changed in duplicity:
status: New → In Progress
Changed in duplicity:
importance: Undecided → Medium
milestone: none → 0.6.1
status: In Progress → Fix Committed
Changed in duplicity:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers