Duplicity

cloudfiles backend slow

Bug #1130649 reported by Soren Hansen on 2013-02-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Duplicity	New	Undecided	Unassigned

Bug Description

I noticed that the cloudfiles backend was incredibly slow. After poking around for a bit, I've realised that the culprit is the call to socket.getdefaulttimeout() in backend.py.

I created a simple test script to upload a 100 MB file to Cloud Files. It does pretty much exactly what the cloudfiles backend does:

import os
import socket
from cloudfiles import Connection
from cloudfiles.errors import ResponseError
from cloudfiles import consts

conn_kwargs = {}
conn_kwargs['username'] = os.environ['CLOUDFILES_USERNAME']
conn_kwargs['api_key'] = os.environ['CLOUDFILES_APIKEY']
conn_kwargs['authurl'] = consts.default_authurl

conn = Connection(**conn_kwargs)
container = conn.create_container('speedtest')
sobject = container.create_object('100Mtest2rnd')
sobject.load_from_filename('100Mtest.rnd')

If I run it like that, it takes around 15 seconds to upload 100 MB. If I add a call to socket.setdefaulttimeout(30) before the Connection call, it takes 11 *minutes*.

If I strace the two runs, I see a call to poll() before each write(). This gets added by Python's socketmodule.c due to the defaulttimeout.

I tried adding timestamps to the strace log and counted how many system calls each of the two runs makes over the course of a single second while transferring the data. With the default sockettimeout, I got just over 100 system calls (poll, write, read, poll, write, read, etc.). It's dealing with a block size of 4kB, so that's 4kB*(100/3) = 133 kB/s. Without the default socket timeout, I got around 1120 system calls (read, write, read, write, etc.). That translates to 4kB*(1120/2) = 2240 kB/s. That's a pretty hefty difference.

What confuses me, though, is that cloudfiles.Connection.__init__() also calls socket.setdefaulttimeout (passing it a value of 5 by default).

Revision history for this message

Soren Hansen (soren) wrote on 2013-02-20:

Heh! No, cloudfiles.Connection.__init__() doesn't call socket.setdefaulttimeout after all. Here's the code snippet:

class Connection(object):
    [...]
    def __init__(self, username=None, api_key=None, **kwargs):
        [...]
        socket.setdefaulttimeout = int(kwargs.get('timeout', 5))

Later versions of python-cloudfiles has this problem fixed and exhibit this slowness regardless of the socket.setdefaulttimeout in duplicity. I suppose I should bug the python-cloudfiles upstream instead.

Revision history for this message

Soren Hansen (soren) wrote on 2013-02-21:

Reported against python-cloudfiles https://github.com/rackspace/python-cloudfiles/issues/89

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.