Slow backup speed

Bug #401094 reported by emilio
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Duplicity
Fix Released
Low
Unassigned
Déjà Dup
Confirmed
Low
Unassigned

Bug Description

I am backing up from a laptop to an external NAS via WiFi. And I am running:
deja-dup 10.1-0jaunty1
duplicity 0.6.02-0jaunty1

I have tested the file writing speed of the NAS and it is of about 2MB/s. However when backing up with deja-dup the speed tops at about 300KB/s.

Interesting is also the fact that the backup on my dual core machine uses only 50% of the available computational power spread over the two cores.

Any idea on where the bottleneck might be?

Changed in duplicity:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Try using the --asynchronous-upload option. It will overlap the marshal/upload cycle to an extent and keep the network a bit busier overall. Work on threading the marshaling process needs to be done, but the complexity of the various parallel, nested, complex iterators makes that a daunting proposition without an almost complete rewrite.

Revision history for this message
emilio (emiliomaggio) wrote :

Should deja-dup, if the backup is on a local network, use this option as a default?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 401094] Re: Slow backup speed

emilio wrote:
> Should deja-dup, if the backup is on a local network, use this option as
> a default?

I see no reason not too, however, the author has not stated that. I
tested it and now use it on my own local network with several machines
and have not had a failure, but until the author thinks its ready, I'm
going to defer to his judgment.

Revision history for this message
Michael Terry (mterry) wrote :

I've looked into it before, but there were some problems with it. I believe they were largely UI-oriented (difficult to show what duplicity was doing, which file it was marshalling, etc).

Now that duplicity tells us when it is uploading a chunk, I may be able to give meaningful progress indication again if this is enabled. I'll look at it.

However, the non-use of --asynchronous-upload does not explain the max 300KB/s speed; at best it explains the 50% CPU, though even then, Ken, you say that the marshalling isn't threaded.

So, maybe I don't understand all that asynchronous does, but I suspect it doesn't necessarily help here.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Michael Terry wrote:
> I've looked into it before, but there were some problems with it. I
> believe they were largely UI-oriented (difficult to show what duplicity
> was doing, which file it was marshalling, etc).
>
> Now that duplicity tells us when it is uploading a chunk, I may be able
> to give meaningful progress indication again if this is enabled. I'll
> look at it.
>
> However, the non-use of --asynchronous-upload does not explain the max
> 300KB/s speed; at best it explains the 50% CPU, though even then, Ken,
> you say that the marshalling isn't threaded.
>
> So, maybe I don't understand all that asynchronous does, but I suspect
> it doesn't necessarily help here.

Depending on how fast the network is, duplicity will be able to overlap
its marshaling with the previous file being uploaded. We don't build up
a queue, so if it takes longer to marshal the files for the next volume,
say a bunch of small files, then you have dead time in the network.

As to CPU, IO does not use much, only the marshaling process and
encryption, and 50% sounds about right in bursts.

...Ken

Revision history for this message
Michael Terry (mterry) wrote :

> Depending on how fast the network is, duplicity will be able to overlap
> its marshaling with the previous file being uploaded. We don't build up
> a queue, so if it takes longer to marshal the files for the next volume,
> say a bunch of small files, then you have dead time in the network.

Right, but he said it maxes out at 300. Asynchronous shouldn't have anything to do with max speed, but rather throughput. emilio, can you clarify if the problem is max speed or throughput?

> As to CPU, IO does not use much, only the marshaling process and
> encryption, and 50% sounds about right in bursts.

emilio has a dual-core, so 50% is really 100% of one core. I meant that since the marshaling isn't threaded, even if asynchronous were turned on, we'd still only use 100% of one core. (Though we'd be using it more often -- i.e. even when uploading something. Another max vs. throughput issue)

So again, unless emilo can clarify about the max/throughput question, I don't think asynchronous is the answer here. We still want both network and CPU maxes to be higher, even though asynchronous would help with overall throughput.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Michael Terry wrote:
>> Depending on how fast the network is, duplicity will be able to overlap
>> its marshaling with the previous file being uploaded. We don't build up
>> a queue, so if it takes longer to marshal the files for the next volume,
>> say a bunch of small files, then you have dead time in the network.
>
> Right, but he said it maxes out at 300. Asynchronous shouldn't have
> anything to do with max speed, but rather throughput. emilio, can you
> clarify if the problem is max speed or throughput?

If this is a throughput issue, then 300 may be the max we can get
without async. If this is a burst issue (need to know how measured),
then much is dependent on the protocol used. Once its in the backend,
its pretty much out of duplicity's control.

>> As to CPU, IO does not use much, only the marshaling process and
>> encryption, and 50% sounds about right in bursts.
>
> emilio has a dual-core, so 50% is really 100% of one core. I meant that
> since the marshaling isn't threaded, even if asynchronous were turned
> on, we'd still only use 100% of one core. (Though we'd be using it more
> often -- i.e. even when uploading something. Another max vs. throughput
> issue)

Sorry, I'm used to looking at 'top' which shows each CPU. Kind of
surprised that some of it is not offloaded to the other CPU since there
are multiple sub-processes running. Wonder what's up with that.

> So again, unless emilo can clarify about the max/throughput question, I
> don't think asynchronous is the answer here. We still want both network
> and CPU maxes to be higher, even though asynchronous would help with
> overall throughput.

Asynchronous will help with throughput, but if done with a queue, the
result will still be the max that the backend can do. If its done with
parallel connections, we can get a big gain at the cost of more space
needed.

Revision history for this message
emilio (emiliomaggio) wrote :

It is max speed. It never goes over 300KB

E.

On Mon, Jul 20, 2009 at 2:54 PM, Michael
Terry<email address hidden> wrote:
>> Depending on how fast the network is, duplicity will be able to overlap
>> its marshaling with the previous file being uploaded. We don't build up
>> a queue, so if it takes longer to marshal the files for the next volume,
>> say a bunch of small files, then you have dead time in the network.
>
> Right, but he said it maxes out at 300.  Asynchronous shouldn't have
> anything to do with max speed, but rather throughput.  emilio, can you
> clarify if the problem is max speed or throughput?
>
>> As to CPU, IO does not use much, only the marshaling process and
>> encryption, and 50% sounds about right in bursts.
>
> emilio has a dual-core, so 50% is really 100% of one core.  I meant that
> since the marshaling isn't threaded, even if asynchronous were turned
> on, we'd still only use 100% of one core.  (Though we'd be using it more
> often -- i.e. even when uploading something.  Another max vs. throughput
> issue)
>
> So again, unless emilo can clarify about the max/throughput question, I
> don't think asynchronous is the answer here.  We still want both network
> and CPU maxes to be higher, even though asynchronous would help with
> overall throughput.
>
> --
> Slow backup speed
> https://bugs.launchpad.net/bugs/401094
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Déjà Dup: New
> Status in duplicity - Bandwidth Efficient Encrypted Backup: Confirmed
>
> Bug description:
> I am backing up from a laptop to an external NAS via WiFi. And I am running:
> deja-dup        10.1-0jaunty1
> duplicity       0.6.02-0jaunty1
>
> I have tested the file writing speed of the NAS and it is of about 2MB/s. However when backing up with deja-dup the speed tops at about 300KB/s.
>
> Interesting is also the fact that the backup on my dual core machine uses only 50% of the available computational power spread over the two cores.
>
> Any idea on where the bottleneck might be?
>

Revision history for this message
emilio (emiliomaggio) wrote :

Also,

the 50% is not on one CPU is about 25% per CPU.

E.

On Mon, Jul 20, 2009 at 3:32 PM, emilio maggio<email address hidden> wrote:
> It is max speed. It never goes over 300KB
>
> E.
>
> On Mon, Jul 20, 2009 at 2:54 PM, Michael
> Terry<email address hidden> wrote:
>>> Depending on how fast the network is, duplicity will be able to overlap
>>> its marshaling with the previous file being uploaded. We don't build up
>>> a queue, so if it takes longer to marshal the files for the next volume,
>>> say a bunch of small files, then you have dead time in the network.
>>
>> Right, but he said it maxes out at 300.  Asynchronous shouldn't have
>> anything to do with max speed, but rather throughput.  emilio, can you
>> clarify if the problem is max speed or throughput?
>>
>>> As to CPU, IO does not use much, only the marshaling process and
>>> encryption, and 50% sounds about right in bursts.
>>
>> emilio has a dual-core, so 50% is really 100% of one core.  I meant that
>> since the marshaling isn't threaded, even if asynchronous were turned
>> on, we'd still only use 100% of one core.  (Though we'd be using it more
>> often -- i.e. even when uploading something.  Another max vs. throughput
>> issue)
>>
>> So again, unless emilo can clarify about the max/throughput question, I
>> don't think asynchronous is the answer here.  We still want both network
>> and CPU maxes to be higher, even though asynchronous would help with
>> overall throughput.
>>
>> --
>> Slow backup speed
>> https://bugs.launchpad.net/bugs/401094
>> You received this bug notification because you are a direct subscriber
>> of the bug.
>>
>> Status in Déjà Dup: New
>> Status in duplicity - Bandwidth Efficient Encrypted Backup: Confirmed
>>
>> Bug description:
>> I am backing up from a laptop to an external NAS via WiFi. And I am running:
>> deja-dup        10.1-0jaunty1
>> duplicity       0.6.02-0jaunty1
>>
>> I have tested the file writing speed of the NAS and it is of about 2MB/s. However when backing up with deja-dup the speed tops at about 300KB/s.
>>
>> Interesting is also the fact that the backup on my dual core machine uses only 50% of the available computational power spread over the two cores.
>>
>> Any idea on where the bottleneck might be?
>>
>

Revision history for this message
emilio (emiliomaggio) wrote :

Sorry I am ansering as I am reading.

The protocol is samba. However with the same protocol I can achieve
about 2MB/s file transfer rate. Therefore it seems something related
with how duplicity uses samba.

If you want me to run some tests please let me know.

Emilio

On Mon, Jul 20, 2009 at 3:34 PM, emilio maggio<email address hidden> wrote:
> Also,
>
> the 50% is not on one CPU is about 25% per CPU.
>
> E.
>
> On Mon, Jul 20, 2009 at 3:32 PM, emilio maggio<email address hidden> wrote:
>> It is max speed. It never goes over 300KB
>>
>> E.
>>
>> On Mon, Jul 20, 2009 at 2:54 PM, Michael
>> Terry<email address hidden> wrote:
>>>> Depending on how fast the network is, duplicity will be able to overlap
>>>> its marshaling with the previous file being uploaded. We don't build up
>>>> a queue, so if it takes longer to marshal the files for the next volume,
>>>> say a bunch of small files, then you have dead time in the network.
>>>
>>> Right, but he said it maxes out at 300.  Asynchronous shouldn't have
>>> anything to do with max speed, but rather throughput.  emilio, can you
>>> clarify if the problem is max speed or throughput?
>>>
>>>> As to CPU, IO does not use much, only the marshaling process and
>>>> encryption, and 50% sounds about right in bursts.
>>>
>>> emilio has a dual-core, so 50% is really 100% of one core.  I meant that
>>> since the marshaling isn't threaded, even if asynchronous were turned
>>> on, we'd still only use 100% of one core.  (Though we'd be using it more
>>> often -- i.e. even when uploading something.  Another max vs. throughput
>>> issue)
>>>
>>> So again, unless emilo can clarify about the max/throughput question, I
>>> don't think asynchronous is the answer here.  We still want both network
>>> and CPU maxes to be higher, even though asynchronous would help with
>>> overall throughput.
>>>
>>> --
>>> Slow backup speed
>>> https://bugs.launchpad.net/bugs/401094
>>> You received this bug notification because you are a direct subscriber
>>> of the bug.
>>>
>>> Status in Déjà Dup: New
>>> Status in duplicity - Bandwidth Efficient Encrypted Backup: Confirmed
>>>
>>> Bug description:
>>> I am backing up from a laptop to an external NAS via WiFi. And I am running:
>>> deja-dup        10.1-0jaunty1
>>> duplicity       0.6.02-0jaunty1
>>>
>>> I have tested the file writing speed of the NAS and it is of about 2MB/s. However when backing up with deja-dup the speed tops at about 300KB/s.
>>>
>>> Interesting is also the fact that the backup on my dual core machine uses only 50% of the available computational power spread over the two cores.
>>>
>>> Any idea on where the bottleneck might be?
>>>
>>
>

Revision history for this message
emilio (emiliomaggio) wrote :

I have done some extra tests and part of the problem was not related to deja-dup but to the driver of my wireless card. Now I can achieve about 500KB/s transfer speed.

However there is still a considerable difference between the maximum transfer speed achieved by deja-dup and the transfer speed achieved by simply uploading a file with the same protocol that is about 4x faster.

Revision history for this message
Michael Terry (mterry) wrote :

I'm not a network expert. But I believe that max speeds are easier to achieve over the time of an upload. So maybe the chunks need to be bigger to hit the max.

Maybe it's because it's going through the FUSE layer to talk to samba (just an extra layer of I/O). The next version of deja-dup will write directly to samba if duplicity can support it (as recent versions do). That might help a little...

I'm not sure what to do about this bug, really.

Changed in deja-dup:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
emilio (emiliomaggio) wrote :

I agree that is not a high priority bug and that there might be more
than one cause.

E.

On Mon, 2009-07-20 at 23:36 +0000, Michael Terry wrote:
> I'm not a network expert. But I believe that max speeds are easier to
> achieve over the time of an upload. So maybe the chunks need to be
> bigger to hit the max.
>
> Maybe it's because it's going through the FUSE layer to talk to samba
> (just an extra layer of I/O). The next version of deja-dup will write
> directly to samba if duplicity can support it (as recent versions do).
> That might help a little...
>
> I'm not sure what to do about this bug, really.
>
> ** Changed in: deja-dup
> Status: New => Confirmed
>
> ** Changed in: deja-dup
> Importance: Undecided => Low
>

Changed in duplicity:
importance: Medium → Low
Revision history for this message
emilio (emiliomaggio) wrote :

I have just moved the backup from a small NAS drive with a samba server to an ubuntu box with openssh and the slow backup problem seems to be solved. I can now backup at about 1MB/sec over wifi and one of the two cores tops at 100% usage.

This suggests that the problems is due to backing up on a samba share.

Revision history for this message
Dave M G (martin-autotelic) wrote :

Using Deja Dup 14.2 in Ubuntu 10.04 on a 1Gb fibre optic connection, and it is crazy slow.

I'm connecting to onlinestoragesolution.com using an FTP connection, with encryption enabled. I'm attempting to backup roughly 200 GB of data.

I do understand that the first time one runs a backup and it has to back up every file that it will take a while.

However, I've been letting Deja Dub run continuously for 36 hours, and it the progress bar barely shows any progress at all. It's maybe 5% done.

Compared to "Back In Time", which achieved a similar backup (without encryption) in about 3 hours, this seems slow.

Is encryption really that much of a time suck, and/or is this a normal amount of time to take?

Revision history for this message
spaetz (spaetz) wrote :

With regard to #15, this might well be on nlinestoragesolution.com's side. I am used to transfer data to them over days rather than hours.

Revision history for this message
David Sardari (david-sardari) wrote :

I use duplicity to backup 28 GB of data to the local hard drive. I only have problems with the bandwith of duplicity.
If I try to copy files to an external hard drive over esata, I get 80-110 MB/sec according to rsync. But, with duplicity I only have a bandwidth of 1 MB/sec according to iotop during an incremental backup to the internal hard drive. During the backup no other processes are running that are using a lot of bandwidth. Is there a way to speed up the backup process?

These are my hardware and software specifications:

root@xeon:~# free
             total used free shared buffers cached
Mem: 8195568 8095008 100560 0 20216 7504640
-/+ buffers/cache: 570152 7625416
Swap: 9764856 0 9764856

root@xeon:~# uname -a
Linux xeon 2.6.32-24-server #43-Ubuntu SMP Thu Sep 16 16:05:42 UTC 2010 x86_64 GNU/Linux

root@xeon:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.1 LTS"

root@xeon:~# duplicity -V
duplicity 0.6.08b

I have a quadcore Xeon X3220 @ 2.40GHz and two internal hard drives in raid 1 with a hot spare managed by mdadm.

Revision history for this message
David Sardari (david-sardari) wrote :

This is the output of "iotop -o":

Total DISK READ: 878.66 K/s | Total DISK WRITE: 0.00 B/s
   TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
17601 be/4 david 878.66 K/s 0.00 B/s ?unavailable? python /usr/bin/duplicity --encrypt-key BA50A935 /backup/duplicity file:///duplicity/

Revision history for this message
David Sardari (david-sardari) wrote :

I put the content of /backup/duplicity and /duplicity in two different new hard drives and rebooted to empty the 8 GB ram. Then, I tried another incremental backup and got around 6-10 MB/sec. (IMHO) due to the empty ram.
I have 3 qcow2 files with a size of 6.5G, 11G and 8.3G in /backup/duplicity/ among other smaller files . Is it better to convert these huge qcow2 in raw format for the incremental backup? The files in raw format will have the sizes 10G, 20G and 100G.

Revision history for this message
David Sardari (david-sardari) wrote :

I found the bugs #550455 and #582962 and solved the problem some days ago by upgrading to the newest version of duplicity. :)

Changed in duplicity:
status: Confirmed → Fix Released
Revision history for this message
Michael Trunner (trunneml) wrote :

Is that bug really fixed? I have exactly the described problem. The destination is a samba share that can handle 12MB/sec and duplicity started with deja-gen only uses 300KB/sec. My duplicity version is 0.6.12 .

Revision history for this message
Dmitry (dimon3000) wrote :

This problem is becoming more relevant nowdays.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.