Amazon S3 backend multipart upload support

Bug #676109 reported by Tomaz Muraus
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Duplicity
Fix Released
Medium
Unassigned

Bug Description

Amazon S3 recently announced multipart upload support - http://aws.typepad.com/aws/2010/11/amazon-s3-multipart-upload.html

Adding support for this to the Amazon S3 backend would be great, because multiple parts can be uploaded in parallel which makes it faster and if the upload for some reason fails, only the volume parts which haven't been successfully uploaded need to be uploaded on retry instead of the whole volume.

Revision history for this message
bJXjLjEHIaWT0tFd (bjxjljehiawt0tfd-deactivatedaccount) wrote :

Absolutely essential for reliable sigtar uploads as uploads of more than a few hundred MB regularly fail.

Revision history for this message
Henrique Carvalho Alves (hcarvalhoalves) wrote :

I have implemented support for multipart upload using the newest python-boto API and multiprocessing. I'm currently using it in production with good results, tweaking the volume size and chunk size I'm able to have as many as 10 parallel upload processes, reducing upload times from days to hours in my case.

Please have a look on my branch, I would love feedback on it, specially on error handling.

https://github.com/hcarvalhoalves/duplicity/tree/s3-multipart-upload

I'm also attaching a patch against 0.6.15.

Thanks!

Changed in duplicity:
importance: Undecided → Medium
milestone: none → 0.6.16
status: New → Fix Committed
Revision history for this message
derp herp (junkmail-trash) wrote :

Hi ,

This breaks the boto backend for *BSD as the python multiprocessing module is not supported on *BSD (and a number of other platforms).

Please modify your patch to make this an optional flag to allow the boto backend to be used without multiprocessing.

Revision history for this message
Henrique Carvalho Alves (hcarvalhoalves) wrote :

Too bad the multiprocessing module doesn't support BSD, I wasn't aware of that.

I would rather change the code to fallback to threading instead, so all platforms can enjoy support for multipart upload. Just disabling it makes Duplicty close to useless for any large backups to S3, as Amazon throttles uploads. I'll have to check if threading is better supported on other platforms, though.

Revision history for this message
Henrique Carvalho Alves (hcarvalhoalves) wrote :

The patch attached in this ticket introduced a bug, please refer to bug #881070:

https://bugs.launchpad.net/duplicity/+bug/881070

Revision history for this message
Henrique Carvalho Alves (hcarvalhoalves) wrote :

I made modifications on my branch for falling back to threads on platforms that don't support multiprocessing using the provided "multiprocessing.dummy" module. I can't test if this solves the issue with *BSD right now, so please test it if you can:

https://github.com/hcarvalhoalves/duplicity/zipball/multiprocessing-fallback

Revision history for this message
derp herp (junkmail-trash) wrote :

No joy.

Import of duplicity.backends.botobackend Failed: No module named filechunkio

-bash-4.0$ python
Python 2.7.2 (default, Oct 23 2011, 09:42:01)
[GCC 3.3.5 (propolice)] on openbsd4
Type "help", "copyright", "credits" or "license" for more information.
>>> from filechunkio import FileChunkIO
>>> FileChunkIO
<class 'filechunkio.filechunkio.FileChunkIO'>

Revision history for this message
derp herp (junkmail-trash) wrote :

Looks like this is the culprit:

-bash-4.0$ grep -nr filechunkio site-packages/duplicity
site-packages/duplicity/backends/botobackend.py:34:from duplicity.filechunkio import FileChunkIO

duplicity.filechunkio is missing

Changed in duplicity:
status: Fix Committed → In Progress
assignee: nobody → Kenneth Loafman (kenneth-loafman)
importance: Medium → High
Revision history for this message
derp herp (junkmail-trash) wrote :

Looks like changing line #34 of botobackend.py to:

from filechunkio import FileChunkIO

resolved the issue with Henrique's multiprocessing-fallback branch on OpenBSD, and chunked file upload seems to be working (I can't tell if it is in fact uploading concurrently, but it was successful)

Revision history for this message
Henrique Carvalho Alves (hcarvalhoalves) wrote :

derp help: it should just fallback to threads instead of processes on BSD platforms. Other than that, it will work the same (upload concurrently), which is much better than just falling back to no multipart upload on BSD.

Revision history for this message
Henrique Carvalho Alves (hcarvalhoalves) wrote :

Please refer to bug 881070 and close this one.

Changed in duplicity:
status: In Progress → Fix Released
importance: High → Medium
assignee: Kenneth Loafman (kenneth-loafman) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.