Comment 2 for bug 1182088

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 1182088] [NEW] Large File upload fails when uploading to S3

Thank you for the fix. I will try to get it into the next release.

...Ken

On Mon, May 20, 2013 at 9:24 AM, someone1 <email address hidden>wrote:

> Public bug reported:
>
> I started to use the s3-use-multiprocessing flag when uploading to S3 to
> maximize my upload throughput. However, on full backups I would notice
> half-way through the "Full signatures" file (mine comes out to be about
> 3GB), I'd repeatedly get the following errors even after retries:
>
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/duplicity/backends/_boto_multi.py", line
> 398, in _upload
> mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback)
> File "/usr/lib/python2.7/site-packages/boto/s3/multipart.py", line 246,
> in upload_part_from_file
> query_args=query_args, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1121, in
> set_contents_from_file
> chunked_transfer=chunked_transfer, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 827, in
> send_file
> query_args=query_args)
> File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 490,
> in make_request
> override_num_retries=override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 932, in
> make_request
> return self._mexe(http_request, sender, override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 894, in
> _mexe
> raise e
> error: [Errno 32] Broken pipe
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/duplicity/backends/_boto_multi.py", line
> 398, in _upload
> mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback)
> File "/usr/lib/python2.7/site-packages/boto/s3/multipart.py", line 246,
> in upload_part_from_file
> query_args=query_args, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1121, in
> set_contents_from_file
> chunked_transfer=chunked_transfer, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 827, in
> send_file
> query_args=query_args)
> File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 490,
> in make_request
> override_num_retries=override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 932, in
> make_request
> return self._mexe(http_request, sender, override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 894, in
> _mexe
> raise e
> timeout: timed out
>
> So I dug into the code and found out that the processor pool being used
> by the multipart upload was setting the number of processes to the
> number of chunks, which is fine for the 30MB volume size archives I
> upload, but the signature file isn't split according to this size (I
> believe there is another bug for this already). Having over 500
> processes kick off was resulting in some bug with my network driver or
> making Amazon rate-limit my attempts to upload. Retries and such were
> failing so I needed to find a better option.
>
> I did some research (http://improve.dk/pushing-the-limits-of-amazon-s3
> -upload-performance/) and found my bandwidth would be more than
> saturated at 8 processes (they use threads here, but the concept is the
> same since we are I/O dependent). I set this limit in my code (changed
> http://bazaar.launchpad.net/~duplicity-
> team/duplicity/0.6-series/view/head:/duplicity/backends/_boto_multi.py#L393
> to: pool = multiprocessing.Pool(processes=min(8,chunks))
>
> This seems to have fixed my issue. So I'm attaching a patch that tries
> to make this a little more dynamic in that the maximum number of
> processes can be controlled via command-line arguments.
>
> I have yet to test this patch but I did test and verify that limiting
> the number of processes fixes the issues I was experiencing.
>
> ** Affects: duplicity
> Importance: Undecided
> Status: New
>
>
> ** Tags: s3
>
> ** Patch added: "patch.patch"
>
> https://bugs.launchpad.net/bugs/1182088/+attachment/3682122/+files/patch.patch
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/1182088
>
> Title:
> Large File upload fails when uploading to S3
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> New
>
> Bug description:
> I started to use the s3-use-multiprocessing flag when uploading to S3
> to maximize my upload throughput. However, on full backups I would
> notice half-way through the "Full signatures" file (mine comes out to
> be about 3GB), I'd repeatedly get the following errors even after
> retries:
>
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/duplicity/backends/_boto_multi.py", line
> 398, in _upload
> mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback)
> File "/usr/lib/python2.7/site-packages/boto/s3/multipart.py", line
> 246, in upload_part_from_file
> query_args=query_args, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1121, in
> set_contents_from_file
> chunked_transfer=chunked_transfer, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 827, in
> send_file
> query_args=query_args)
> File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line
> 490, in make_request
> override_num_retries=override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 932,
> in make_request
> return self._mexe(http_request, sender, override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 894,
> in _mexe
> raise e
> error: [Errno 32] Broken pipe
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/site-packages/duplicity/backends/_boto_multi.py", line
> 398, in _upload
> mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback)
> File "/usr/lib/python2.7/site-packages/boto/s3/multipart.py", line
> 246, in upload_part_from_file
> query_args=query_args, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 1121, in
> set_contents_from_file
> chunked_transfer=chunked_transfer, size=size)
> File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 827, in
> send_file
> query_args=query_args)
> File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line
> 490, in make_request
> override_num_retries=override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 932,
> in make_request
> return self._mexe(http_request, sender, override_num_retries)
> File "/usr/lib/python2.7/site-packages/boto/connection.py", line 894,
> in _mexe
> raise e
> timeout: timed out
>
> So I dug into the code and found out that the processor pool being
> used by the multipart upload was setting the number of processes to
> the number of chunks, which is fine for the 30MB volume size archives
> I upload, but the signature file isn't split according to this size (I
> believe there is another bug for this already). Having over 500
> processes kick off was resulting in some bug with my network driver or
> making Amazon rate-limit my attempts to upload. Retries and such were
> failing so I needed to find a better option.
>
> I did some research (http://improve.dk/pushing-the-limits-of-amazon-s3
> -upload-performance/) and found my bandwidth would be more than
> saturated at 8 processes (they use threads here, but the concept is
> the same since we are I/O dependent). I set this limit in my code
> (changed http://bazaar.launchpad.net/~duplicity-
>
> team/duplicity/0.6-series/view/head:/duplicity/backends/_boto_multi.py#L393
> to: pool = multiprocessing.Pool(processes=min(8,chunks))
>
> This seems to have fixed my issue. So I'm attaching a patch that tries
> to make this a little more dynamic in that the maximum number of
> processes can be controlled via command-line arguments.
>
> I have yet to test this patch but I did test and verify that limiting
> the number of processes fixes the issues I was experiencing.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/1182088/+subscriptions
>