Large File upload fails when uploading to S3
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Duplicity |
Fix Released
|
Medium
|
Unassigned |
Bug Description
I started to use the s3-use-
Traceback (most recent call last):
File "/usr/lib/
mp.
File "/usr/lib/
query_
File "/usr/lib/
chunked_
File "/usr/lib/
query_
File "/usr/lib/
override_
File "/usr/lib/
return self._mexe(
File "/usr/lib/
raise e
error: [Errno 32] Broken pipe
Traceback (most recent call last):
File "/usr/lib/
mp.
File "/usr/lib/
query_
File "/usr/lib/
chunked_
File "/usr/lib/
query_
File "/usr/lib/
override_
File "/usr/lib/
return self._mexe(
File "/usr/lib/
raise e
timeout: timed out
So I dug into the code and found out that the processor pool being used by the multipart upload was setting the number of processes to the number of chunks, which is fine for the 30MB volume size archives I upload, but the signature file isn't split according to this size (I believe there is another bug for this already). Having over 500 processes kick off was resulting in some bug with my network driver or making Amazon rate-limit my attempts to upload. Retries and such were failing so I needed to find a better option.
I did some research (http://
This seems to have fixed my issue. So I'm attaching a patch that tries to make this a little more dynamic in that the maximum number of processes can be controlled via command-line arguments.
I have yet to test this patch but I did test and verify that limiting the number of processes fixes the issues I was experiencing.
Related branches
- edso: Needs Information
-
Diff: 998 lines (+321/-416)8 files modifiedREADME (+1/-0)
bin/duplicity (+11/-0)
bin/duplicity.1 (+38/-0)
duplicity/backends/_boto_multi.py (+75/-308)
duplicity/backends/_boto_single.py (+171/-104)
duplicity/backends/botobackend.py (+11/-4)
duplicity/commandline.py (+8/-0)
duplicity/globals.py (+6/-0)
Changed in duplicity: | |
milestone: | none → 0.6.24 |
Changed in duplicity: | |
importance: | Undecided → Medium |
Changed in duplicity: | |
status: | Fix Committed → Fix Released |
Thank you for the fix. I will try to get it into the next release.
...Ken
On Mon, May 20, 2013 at 9:24 AM, someone1 <email address hidden>wrote:
> Public bug reported: multiprocessing flag when uploading to S3 to python2. 7/site- packages/ duplicity/ backends/ _boto_multi. py", line part_from_ file(fd, offset + 1, cb=_upload_ callback) python2. 7/site- packages/ boto/s3/ multipart. py", line 246, part_from_ file query_args, size=size) python2. 7/site- packages/ boto/s3/ key.py" , line 1121, in from_file transfer= chunked_ transfer, size=size) python2. 7/site- packages/ boto/s3/ key.py" , line 827, in query_args) python2. 7/site- packages/ boto/s3/ connection. py", line 490, num_retries= override_ num_retries) python2. 7/site- packages/ boto/connection .py", line 932, in http_request, sender, override_ num_retries) python2. 7/site- packages/ boto/connection .py", line 894, in python2. 7/site- packages/ duplicity/ backends/ _boto_multi. py", line part_from_ file(fd, offset + 1, cb=_upload_ callback) python2. 7/site- packages/ boto/s3/ multipart. py", line 246, part_from_ file query_args, size=size) python2. 7/site- packages/ boto/s3/ key.py" , line 1121, in from_file transfer= chunked_ transfer, size=size) python2. 7/site- packages/ boto/s3/ key.py" , line 827, in query_args) python2. 7/site- packages/ boto/s3/ connection. py", line 490, num_retries= override_ num_retries) python2. 7/site- packages/ boto/connection .py", line 932, in http_request, sender, override_ num_retries) python2. 7/site- packages/ boto/connection .py", line 894, in improve. dk/pushing- the-limits- of-amazon- s3
>
> I started to use the s3-use-
> maximize my upload throughput. However, on full backups I would notice
> half-way through the "Full signatures" file (mine comes out to be about
> 3GB), I'd repeatedly get the following errors even after retries:
>
> Traceback (most recent call last):
> File
> "/usr/lib/
> 398, in _upload
> mp.upload_
> File "/usr/lib/
> in upload_
> query_args=
> File "/usr/lib/
> set_contents_
> chunked_
> File "/usr/lib/
> send_file
> query_args=
> File "/usr/lib/
> in make_request
> override_
> File "/usr/lib/
> make_request
> return self._mexe(
> File "/usr/lib/
> _mexe
> raise e
> error: [Errno 32] Broken pipe
> Traceback (most recent call last):
> File
> "/usr/lib/
> 398, in _upload
> mp.upload_
> File "/usr/lib/
> in upload_
> query_args=
> File "/usr/lib/
> set_contents_
> chunked_
> File "/usr/lib/
> send_file
> query_args=
> File "/usr/lib/
> in make_request
> override_
> File "/usr/lib/
> make_request
> return self._mexe(
> File "/usr/lib/
> _mexe
> raise e
> timeout: timed out
>
> So I dug into the code and found out that the processor pool being used
> by the multipart upload was setting the number of processes to the
> number of chunks, which is fine for the 30MB volume size archives I
> upload, but the signature file isn't split according to this size (I
> believe there is another bug for this already). Having over 500
> processes kick off was resulting in some bug with my network driver or
> making Amazon rate-limit my attempts to upload. Retries and such were
> failing so I needed to find a better option.
>
> I did some research (http://
> -upload-perfor...