cleanup results in huge process

Bug #1350404 reported by Ian Chard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Duplicity
New
Undecided
Unassigned

Bug Description

When I run duplicity with 'cleanup --extra-clean' on a large backup target, I get a 13GB process that runs forever (or at least for two hours, which is how long I ran it for). strace reveals that it doesn't do any I/O but spins on the CPU the whole time.

Duplicity 0.6.24
Python 2.7.3
Debian Wheezy

Here's the output of the process:

# duplicity -v9 --archive-dir /data/duplicity-archive/ --gpg-options --homedir=/data/gpg-home/ --encrypt-key xxxxxxxx --s3-european-buckets --s3-use-new-style --asynchronous-upload --full-if-older-than 10D --allow-source-mismatch --num-retries 1 cleanup --extra-clean --force cf+http://vhost_backup_dir-mammoth_data_vhost_www.xyzxyz.com_photos
Using archive dir: /data/duplicity-archive/b32efd07b4efc62f3a6f03722cbf64ae
Using backup name: b32efd07b4efc62f3a6f03722cbf64ae
Import of duplicity.backends.botobackend Succeeded
Import of duplicity.backends.cfbackend Succeeded
Import of duplicity.backends.dpbxbackend Succeeded
Import of duplicity.backends.ftpbackend Succeeded
Import of duplicity.backends.ftpsbackend Succeeded
Import of duplicity.backends.gdocsbackend Succeeded
Import of duplicity.backends.hsibackend Succeeded
Import of duplicity.backends.imapbackend Succeeded
Import of duplicity.backends.localbackend Succeeded
Import of duplicity.backends.megabackend Succeeded
Import of duplicity.backends.rsyncbackend Succeeded
Import of duplicity.backends.sshbackend Succeeded
Import of duplicity.backends.swiftbackend Succeeded
Import of duplicity.backends.tahoebackend Succeeded
Import of duplicity.backends.webdavbackend Succeeded
Import of duplicity.backends.~par2wrapperbackend Succeeded
Main action: cleanup
================================================================================
duplicity 0.6.24 (May 09, 2014)
Args: /usr/local/bin/duplicity -v9 --archive-dir /data/duplicity-archive/ --gpg-options --homedir=/data/gpg-home/ --encrypt-key xxxxxxxx --s3-european-buckets --s3-use-new-style --asynchronous-upload --full-if-older-than 10D --allow-source-mismatch --num-retries 1 cleanup --extra-clean --force cf+http://vhost_backup_dir-mammoth_data_vhost_www.xyzxyz.com_photos
Linux leopard1 3.2.0-4-amd64 #1 SMP Debian 3.2.60-1+deb7u1 x86_64
/usr/bin/python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2]
================================================================================
Listing '<Container 'vhost_backup_dir-mammoth_data_vhost_www.xyzxyz.com_photos'>'
Synchronizing remote metadata to local cache...
Copying duplicity-full-signatures.20140703T113651Z.sigtar.gpg to local cache.
Using temporary directory /tmp/duplicity-QZGsOA-tempdir
Registering (mktemp) temporary file /tmp/duplicity-QZGsOA-tempdir/mktemp-_tPj6e-1
Downloading '<Container 'vhost_backup_dir-mammoth_data_vhost_www.xyzxyz.com_photos'>/duplicity-full-signatures.20140703T113651Z.sigtar.gpg'
[...hangs forever...]

And here's the output of collection-status for this target:

Local and Remote metadata are synchronized, no sync needed.
Warning, found incomplete backup sets, probably left from aborted session
Last full backup date: Sat Jul 26 10:24:43 2014
Collection Status
-----------------
Connecting with backend: PyraxBackend
Archive directory: /data/duplicity-archive/b32efd07b4efc62f3a6f03722cbf64ae

Found 3 secondary backup chains.
Secondary chain 1 of 3:
-------------------------
Chain start time: Thu Jul 3 12:36:51 2014
Chain end time: Wed Jul 9 10:15:36 2014
Number of contained backup sets: 7
Total number of contained volumes: 5821
 Type of backup set: Time: Number of volumes:
                Full Thu Jul 3 12:36:51 2014 2795
         Incremental Fri Jul 4 09:35:39 2014 678
         Incremental Sat Jul 5 09:18:24 2014 327
         Incremental Sun Jul 6 09:36:55 2014 286
         Incremental Mon Jul 7 09:21:16 2014 426
         Incremental Tue Jul 8 10:40:53 2014 809
         Incremental Wed Jul 9 10:15:36 2014 500
-------------------------

Secondary chain 2 of 3:
-------------------------
Chain start time: Thu Jul 10 12:44:22 2014
Chain end time: Thu Jul 17 09:25:27 2014
Number of contained backup sets: 8
Total number of contained volumes: 8212
 Type of backup set: Time: Number of volumes:
                Full Thu Jul 10 12:44:22 2014 2772
         Incremental Fri Jul 11 10:08:27 2014 636
         Incremental Sat Jul 12 09:11:38 2014 609
         Incremental Sun Jul 13 09:23:05 2014 429
         Incremental Mon Jul 14 09:58:30 2014 656
         Incremental Tue Jul 15 11:11:28 2014 1504
         Incremental Wed Jul 16 09:10:31 2014 948
         Incremental Thu Jul 17 09:25:27 2014 658
-------------------------

Secondary chain 3 of 3:
-------------------------
Chain start time: Fri Jul 18 10:29:29 2014
Chain end time: Fri Jul 25 09:21:46 2014
Number of contained backup sets: 8
Total number of contained volumes: 7746
 Type of backup set: Time: Number of volumes:
                Full Fri Jul 18 10:29:29 2014 2888
         Incremental Sat Jul 19 09:19:32 2014 531
         Incremental Sun Jul 20 09:35:08 2014 580
         Incremental Mon Jul 21 09:56:36 2014 599
         Incremental Tue Jul 22 10:37:19 2014 765
         Incremental Wed Jul 23 07:51:22 2014 638
         Incremental Thu Jul 24 10:00:52 2014 901
         Incremental Fri Jul 25 09:21:46 2014 844
-------------------------

Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Sat Jul 26 10:24:43 2014
Chain end time: Wed Jul 30 08:55:59 2014
Number of contained backup sets: 5
Total number of contained volumes: 5485
 Type of backup set: Time: Number of volumes:
                Full Sat Jul 26 10:24:43 2014 2930
         Incremental Sun Jul 27 08:56:41 2014 406
         Incremental Mon Jul 28 08:39:07 2014 779
         Incremental Tue Jul 29 09:29:27 2014 711
         Incremental Wed Jul 30 08:55:59 2014 659
-------------------------
Also found 0 backup sets not part of any chain,
and 8 incomplete backup sets.
These may be deleted by running duplicity with the "cleanup" command.

Revision history for this message
Ian Chard (ian-chard) wrote :

This process has now been running for over 16 hours. It has progressed a little, reading more manifest files, but is still very large and using 100% CPU.

Revision history for this message
Ian Chard (ian-chard) wrote :

I've tried the same command without --extra-clean and it's exhibiting the same behaviour.

summary: - cleanup --extra-clean results in huge process
+ cleanup results in huge process
Ian Chard (ian-chard)
description: updated
Revision history for this message
edso (ed.so) wrote :

1. can you send a full max verbosity output '-v9' ? obfuscate strings in it you deem to be private, security sensitive!
use pastebin or the like.

looks like clean is patched by the debian maintainer to be extra_clean always, which totally overrides and as such obsoletes the parameter!

2. if possible please try the latest python 2.7, to exclude a python issue. current is 2.7.8 .. you could compile a local version or use the one in debian testing already.

3. could it be that your system starts swapping due to the high memory constraint? how many gpg processes does it keep open? can you send a process list output?

..ede/duply.net

Revision history for this message
Ian Chard (ian-chard) wrote :

1. Full output here: http://pastebin.com/1r1TsgHi -- this is what I have after 23 hours' runtime.
I'm not using the Debian duplicity -- I'm using 0.6.24 that I downloaded and installed from the duplicity website.

2. The dependencies that the new python pulls in (libc, postgres) would mean a bit of downtime, so I can't do this straight away, but will do if you think it's definitely worth doing.

3. My system isn't swapping, and it's the duplicity process itself that's using 100% CPU. As of right now there is no gpg process running. I've looked at the process list and duplicity doesn't have any child processes. Process listing here: http://pastebin.com/xmAkdND9

Thanks for your help so far!
- Ian

Revision history for this message
Ian Chard (ian-chard) wrote :

Another couple of data points:

* The 13GB clean job finally finished over the weekend. Total runtime just over 3 days.

* I've been running a clean against a different target today, and it's been running all day at 100% cpu, process size 3GB, no gpg children.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.