Add --test-restore option (like --verify but compares to local files)

Bug #643973 reported by Aaron Whitehouse on 2010-09-20
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Duplicity
Medium
Unassigned

Bug Description

Further to this Answer:
https://answers.launchpad.net/duplicity/+question/116587
it would be excellent if duplicity offered a --test-restore option (as an option when *backing up*) that, after the backup, ensured that each of the files that were backed up could be successfully restored to the current local copy. As each backed-up file can be spread over many diffs that need to be assembled at restore time, there is a very real risk that something could go wrong. Knowing this before one needs that backup is crucial.

"--test-restore
 After completing the backup, this will run a 'test restore' on each file included in the
 backup, to test that the remote archive could be restored to exactly match the local
 files. Duplicity will download the archive files from the remote location, decompress
 and decrypt them, and compare them to the local copy of the files. This will alert
 you to any problems in restoring your remote backup (for example, if one of your
 remote archives has corrupted). Duplicity will exit with a non-zero error level if any
 files are different or if the remote archive(s) could not be successfully compared."

My current thinking is that the easiest way to do this would be to
(1) take the list of files included in the backup;
(2) throw them into a list;
(3) pop off one at a time;
(4) for each, do a --file-to-restore restore of the latest version to [cache]/temp-restored/[file];
(5) cmp (http://linux.die.net/man/1/cmp) that restored file to the filesystem version and check for a zero exit value (we could do a hash, but to me that seems like extra computation for no benefit);
(6) repeat steps 3 to 5; and
(7) report any errors.

Nice future enhancements would be:
(1) to order the files to be tested according to which archive they are in, so that a 10MB file full of 100KB files is only downloaded once; and
(2) to offer the ability to do a full (non-diff) backup of those files that failed to restore.

I thought about doing a bash script to do this, but I think it would be useful as part of the main program and it should be easier to get the list of backed up files from within duplicity. I hoped to have a go at this myself, but will not have a chance anytime soon, so wanted to get it in the system. I also have zero experience with duplicity or python, so don't get your hopes up!

I think that it is incredibly important that we do not try to be too clever with this option and to only use those code paths that are used by a genuine restore. If there is any real chance that this test-restore could pass and a real restore could fail, there isn't really any use for it.

Clearly bandwidth is going to be an issue for anyone backing up remotely. For those like me who back up to local hard drives or over a LAN, this would provide me with much more confidence than staring at a list of encrypted diff files with little downside. As I normally plug the external HDD into the server and leave it overnight, I would run a test-restore every time I backed up, if it existed.

I understand that --verify performs a similar role (though I find it very confusing), in that it verifies the remote archives match the signatures for specified files. I see test-restore as much closer to answering the question that I want to know *for my use case*, which is "if the muck hit the fan and my house blew up, would I have a safe and restorable backup of all of my data?" I also think that it makes more sense for my use-case to have this as an option to the backup line, rather than as a variant of the restore line, as what I am interested in is that I have a restorable version of all the files I am backing up, rather than that I can restore all the files that happen to be on the remote location (that distinction shouldn't be important, but introduces an extra layer). In my case, it is also easier because I have a very complex set of excludes and includes, so I would rather not risk the possibility of the backup command getting out of sync with the test-restore line and have things fall through gaps. I note that --verify is still a very important command for other use-cases, such as testing whether a system could be restored to the state it was in 3 days ago, which would clearly be different from the local files.

I note that an obvious case where the test restore would throw an error would be where the file has changed in the seconds since the backup (say, if a log file is being backed up). In that case, a simple warning could be given saying that:
- the file restored correctly;
- the file matched its original hash (I believe that this is part of the restore function);
- the restored file differs from the local file; but
- that this was expected, as the "modified time" of the local file shows that the local file has changed since the backup.

Implemented as the --compare-data option when doing a verify.

Changed in duplicity:
status: New → Fix Released
Changed in duplicity:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers