silent data corruption with checkpoint/restore

Bug #613244 reported by Liraz on 2010-08-03
52
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Duplicity
High
Unassigned
duplicity (Ubuntu)
High
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned

Bug Description

I've found a serious bug with checkpoint/restore that leads to
silent data corruption. I'm using duplicity 0.6.09 installed
on a 32-bit Ubuntu system, backing up to the local filesystem.

I discovered this by running two full backups with the same parameters.

duplicity --archive-dir /var/cache/duplicity --volsize 10
--include-filelist /tmp/filelist --exclude '**' /
file:///var/tmp/duplicity

The first backup I allowed to complete without interruption.

The second backup I repeatedly stopped/resumed after the first volume
had been created.

I then restored the backups to /tmp/restore.broken and /tmp/restore and
compared them as follows:

 cd /tmp/restore
 find |xargs stat -c "%n %U %G %A %s" > statlist

 cd /tmp/restore.broken
 find |xargs stat -c "%n %U %G %A %s" > statlist

 cd ../
 diff -u /tmp/restore/{statlist,statlist.broken}

I discovered what looks like one corrupt file for each time I
CTRL-C/resumed the backup. I'm pretty sure these are the files duplicity
resumed from.

Liraz (liraz-siri) wrote :

Reproduced this bug more simply to rule out anything having to do with --include-filelist and such.

First backup:

duplicity --volsize 5 /data file:///tmp/backup/uninterrupted

Second backup:

duplicity --volsize 5 /data file:///tmp/backup/interrupted

Interrupted repeatedly with CTRL-C.

Now restore the interrupted and uninterrupted backups and compare.
duplicity file:///tmp/backup/uninterrupted /tmp/restore/uninterrupted
duplicity file:///tmp/backup/interrupted /tmp/restore/interrupted

cd /tmp/restore/uninterrupted
find |xargs stat -c "%n %U %G %A %s" > statlist

cd /tmp/restore/interrupted
find |xargs stat -c "%n %U %G %A %s" > statlist

cd /tmp/restore
diff -u {interrupted,uninterrupted}/statlist

Liraz (liraz-siri) wrote :

In what may be a separate but related bug, a verify run WILL NOT discover the data corruption:

duplicity verify file:///tmp/backup/interrupted /data

Liraz (liraz-siri) wrote :

Just in case you're hoping the next incremental backup will smooth
things over, no such luck. In fact it gets worse. The next incremental
backup will corrupt the backup to the point where it won't restore at
all:

    duplicity file:///tmp/backup/interrupted /tmp/restore/interrupted
    Local and Remote metadata are synchronized, no sync needed.
    Last full backup date: Fri Jul 30 04:44:24 2010
    Traceback (most recent call last):
      File "/usr/bin/duplicity", line 1251, in <module>
        with_tempdir(main)
      File "/usr/bin/duplicity", line 1244, in with_tempdir
        fn()
      File "/usr/bin/duplicity", line 1198, in main
        restore(col_stats)
      File "/usr/bin/duplicity", line 538, in restore
        restore_get_patched_rop_iter(col_stats)):
      File "/usr/lib/python2.5/site-packages/duplicity/patchdir.py",
line 522, in Write_ROPaths
        ITR( ropath.index, ropath )
      File "/usr/lib/python2.5/site-packages/duplicity/lazy.py", line
320, in __call__
        self.process_w_branch(index, self.root_branch, args)
      File "/usr/lib/python2.5/site-packages/duplicity/lazy.py", line
293, in process_w_branch
        branch.start_process, args)
      File "/usr/lib/python2.5/site-packages/duplicity/robust.py", line
37, in check_common_error
        return function(*args)
      File "/usr/lib/python2.5/site-packages/duplicity/patchdir.py",
line 544, in start_process
        assert ropath.index == (), ropath.index

The good news is that if you run verify now it will at least detect
something is terribly wrong:

    duplicity verify file:///tmp/backup/interrupted /data
    ...
    verify complete: 1864 files compared, 781 differences found.

Closer inspections shows that it looks like the first incremental backup
deleted the contents of all volumes which were corrupted in the first
full backup.

The second incremental backup restores the deleted contents. This makes
the latest backup restorable again. Now there are fewer data corruptions
but they still exist. I didn't test what happens if I try to use
checkpoint/restore in the incremental backups. I very much doubt it
would work.

Steffen H. (shulegaa) wrote :

I was flabbergast to find that *ANY* interruptions to a long running, multi-volume duplicity (v0.6.10) backup will eventually complete (with enough restart coaxing) but, they will *always* (happily) produce unusable/corrupt duplicity volumes.

I spent a lot of time *NOT* believing that Duplicity can/should only be used over uber-reliable LANs ... and never over the internet/WAN. Thank goodness I test my backups religiously. Duplicity is close to great ... but it sure isn't 1.0 yet. I don't know how regressions to Duplicity's checkpoint/restart feature pass the whole continuous integration/test nightly ... or even a one time QA-at-the-end (of dev/release cycle) test. That is worrisome. Bummer. You have been warned!

Changed in duplicity:
assignee: nobody → Kenneth Loafman (kenneth-loafman)
importance: Undecided → High
milestone: none → 0.6.13
status: New → Confirmed
Changed in duplicity:
status: Confirmed → In Progress

I have just committed a fix for this issue, and a manual test script to use in future testing. It turned out to be a simple fix, but hard to find. The problem was that in certain cases duplicity was restoring the same block twice.

In the testing directory is a script called manual-ctrl-c-test.py which emulates what you did manually. Please use that script or your own methods to verify this fix.

Changed in duplicity:
assignee: Kenneth Loafman (kenneth-loafman) → nobody
status: In Progress → Fix Committed
Changed in duplicity:
milestone: 0.6.13 → none
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package duplicity - 0.6.13-0ubuntu1

---------------
duplicity (0.6.13-0ubuntu1) natty; urgency=low

  * Resync with Debian, no remaining changes
  * New upstream release
    - silent data corruption with checkpoint/restore (LP: #613244)
    - Assertion error "time not moving forward at appropriate pace"
      (LP: #579958)
  * debian/patches/04future.dpatch:
    - Dropped, applied upstream
 -- Michael Terry <email address hidden> Sun, 03 Apr 2011 20:40:34 -0400

Changed in duplicity (Ubuntu):
status: New → Fix Released
Michael Terry (mterry) wrote :

I've uploaded SRU packages to lucid and maverick's -proposed. Here's a script to help reproduce it.

Run the script with a directory to backup (to /tmp).
Wait for the prompt, press enter
Then press Ctrl+C after a bit to interrupt the backup
You'll be asked if you want to repeat, type 'y' and press enter
Then press Ctrl+C again, repeat a few times.
Type 'n' and press enter to the repeat question and then wait for it to finish.

Changed in duplicity (Ubuntu Lucid):
status: New → Confirmed
Changed in duplicity (Ubuntu Maverick):
status: New → Confirmed

Hello Liraz, or anyone else affected,

Accepted duplicity into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in duplicity (Ubuntu Maverick):
status: Confirmed → Fix Committed
tags: added: verification-needed
Changed in duplicity (Ubuntu Lucid):
status: Confirmed → Fix Committed
Clint Byrum (clint-fewbar) wrote :

Hello Liraz, or anyone else affected,

Accepted duplicity into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Mike O'Donnell (mikeodonnell) wrote :
Download full text (4.3 KiB)

Michael Terry,

This script (manual-ctrl-c-test.sh) hung expecting a second argument.

mikodo@mikodo-desktop:~$ #!/bin/bash
mikodo@mikodo-desktop:~$
mikodo@mikodo-desktop:~$ cd /tmp
mikodo@mikodo-desktop:/tmp$ rm -rf backup* restore* ~/.cache/duplicity/test-backup1
mikodo@mikodo-desktop:/tmp$
mikodo@mikodo-desktop:/tmp$ echo "***** Do not disturb the control backup! "
***** Do not disturb the control backup!
mikodo@mikodo-desktop:/tmp$ PASSPHRASE=foo duplicity --name=test-backup1 $1 file:///tmp/backup1
Command line error: Expected 2 args, got 1
Enter 'duplicity --help' for help screen.
mikodo@mikodo-desktop:/tmp$
mikodo@mikodo-desktop:/tmp$ echo -n "***** Now hit Ctrl-C at random intervals. Hit any key to continue... "
***** Now hit Ctrl-C at random intervals. Hit any key to continue... mikodo@mikodo-desktop:/tmp$ read -e DUMMY
echo
mikodo@mikodo-desktop:/tmp$
mikodo@mikodo-desktop:/tmp$ trap "pkill -INT duplicity" SIGINT
mikodo@mikodo-desktop:/tmp$
mikodo@mikodo-desktop:/tmp$ RC=4
mikodo@mikodo-desktop:/tmp$ while [ "$RC" == "4" ]; do
> PASSPHRASE=foo duplicity --name=test-backup2 --volsize 1 -v5 $1 file:///tmp/backup2
> RC=$?
> echo "Exit == $RC"
> if [ "$RC" != "4" ] && [ "$RC" != "0" ]; then
> echo "Repeat? "
> read -e REPLY
> if [ "$REPLY" == "Y" ] || [ "$REPLY" == "y" ]; then
> RC=4
> continue
> else
> break
> fi
> fi
> done
Command line error: Expected 2 args, got 1
Enter 'duplicity --help' for help screen.
Using temporary directory /tmp/duplicity-sD7fa7-tempdir
Exit == 2
Repeat?

mikodo@mikodo-desktop:/tmp$ trap - SIGINT
mikodo@mikodo-desktop:/tmp$
mikodo@mikodo-desktop:/tmp$ if [ "$RC" != "0" ]; then
> echo "Finishing backup..."
> PASSPHRASE=foo duplicity --name=test-backup2 $1 file:///tmp/backup2
> fi
Finishing backup...
Command line error: Expected 2 args, got 1
Enter 'duplicity --help' for help screen.
mikodo@mikodo-desktop:/tmp$
mikodo@mikodo-desktop:/tmp$ echo "Restoring backups..."
Restoring backups...
mikodo@mikodo-desktop:/tmp$ PASSPHRASE=foo duplicity --name=test-backup1 file:///tmp/backup1 /tmp/restore1
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1241, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1234, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1188, in main
    restore(col_stats)
  File "/usr/bin/duplicity", line 538, in restore
    restore_get_patched_rop_iter(col_stats)):
  File "/usr/bin/duplicity", line 560, in restore_get_patched_rop_iter
    backup_chain = col_stats.get_backup_chain_at_time(time)
  File "/usr/lib/python2.6/dist-packages/duplicity/collections.py", line 939, in get_backup_chain_at_time
    raise CollectionsError("No backup chains found")
CollectionsError: No backup chains found

mikodo@mikodo-desktop:/tmp$ PASSPHRASE=foo duplicity --name=test-backup2 file:///tmp/backup2 /tmp/restore2
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1241,...

Read more...

Michael Terry (mterry) wrote :

Mike, did you pass it a folder to back up? I did not make the script very error-proof, it might blow up without an argument. Give it a folder like your Downloads directory or something.

Mike O'Donnell (mikeodonnell) wrote :

Sorry Michael, I don't know how to do that.

Michael Terry (mterry) wrote :

Mike, try running it like:

sh manual-ctrl-c-test.sh ~/Downloads

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package duplicity - 0.6.08b-0ubuntu2.1

---------------
duplicity (0.6.08b-0ubuntu2.1) lucid-proposed; urgency=low

  * debian/patches/04dontrestoresameblock.dpatch:
    - Fix possible backup corruption bug when resuming backups
      LP: #613244
  * debian/patches/06shamismatch.dpatch:
    - Fix another possible backup corruption bug when resuming backups
      LP: #487720
 -- Michael Terry <email address hidden> Tue, 16 Aug 2011 14:15:12 -0400

Changed in duplicity (Ubuntu Lucid):
status: Fix Committed → Fix Released
Adolfo Jayme (fitojb) on 2013-07-06
Changed in duplicity (Ubuntu Maverick):
status: Fix Committed → Won't Fix
Changed in duplicity (Ubuntu):
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers