cloudfiles backend slow

Bug #1130649 reported by Soren Hansen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Duplicity
New
Undecided
Unassigned

Bug Description

I noticed that the cloudfiles backend was incredibly slow. After poking around for a bit, I've realised that the culprit is the call to socket.getdefaulttimeout() in backend.py.

I created a simple test script to upload a 100 MB file to Cloud Files. It does pretty much exactly what the cloudfiles backend does:

import os
import socket
from cloudfiles import Connection
from cloudfiles.errors import ResponseError
from cloudfiles import consts

conn_kwargs = {}
conn_kwargs['username'] = os.environ['CLOUDFILES_USERNAME']
conn_kwargs['api_key'] = os.environ['CLOUDFILES_APIKEY']
conn_kwargs['authurl'] = consts.default_authurl

conn = Connection(**conn_kwargs)
container = conn.create_container('speedtest')
sobject = container.create_object('100Mtest2rnd')
sobject.load_from_filename('100Mtest.rnd')

If I run it like that, it takes around 15 seconds to upload 100 MB. If I add a call to socket.setdefaulttimeout(30) before the Connection call, it takes 11 *minutes*.

If I strace the two runs, I see a call to poll() before each write(). This gets added by Python's socketmodule.c due to the defaulttimeout.

I tried adding timestamps to the strace log and counted how many system calls each of the two runs makes over the course of a single second while transferring the data. With the default sockettimeout, I got just over 100 system calls (poll, write, read, poll, write, read, etc.). It's dealing with a block size of 4kB, so that's 4kB*(100/3) = 133 kB/s. Without the default socket timeout, I got around 1120 system calls (read, write, read, write, etc.). That translates to 4kB*(1120/2) = 2240 kB/s. That's a pretty hefty difference.

What confuses me, though, is that cloudfiles.Connection.__init__() also calls socket.setdefaulttimeout (passing it a value of 5 by default).

Revision history for this message
Soren Hansen (soren) wrote :

Heh! No, cloudfiles.Connection.__init__() doesn't call socket.setdefaulttimeout after all. Here's the code snippet:

class Connection(object):
    [...]
    def __init__(self, username=None, api_key=None, **kwargs):
        [...]
        socket.setdefaulttimeout = int(kwargs.get('timeout', 5))

Later versions of python-cloudfiles has this problem fixed and exhibit this slowness regardless of the socket.setdefaulttimeout in duplicity. I suppose I should bug the python-cloudfiles upstream instead.

Revision history for this message
Soren Hansen (soren) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.