Inconsistent Verify: Invalid data - SHA1 hash mismatch : ErrorReturnCode=21

Bug #930866 reported by Rodrigo Alvarez
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Duplicity
Confirmed
Medium
Unassigned

Bug Description

Short story:

When I run duplicity in verify mode on a given set, it sometimes passes, and it sometimes fails with reporting:
Invalid data - SHA1 hash mismatch for file:
 SOME_VOLUME.difftar.gpg
 Calculated hash: SOME_HASH
 Manifest hash: ANOTHER_HASH

If I run it again, SOME_VOLUME gets through and ANOTHER_VOLUME reports the error. If I restart the server, then verify passes (sometimes).

Yes, the Hard drive where the volumes are stored is healthy (as reported by SMART) and when diff(ed) with its off-site mirror reports no differences.

The details:
I have several backup sets {Projects, Documents, Code, etc...} on Ubuntu 10.04 which I backup with duplicity 0.6.17 to an internal hard drive--the primary backup target.

Once a month I run duplicity in verify mode to make sure that my primary backup is still a backup. This month two sets failed. This was not a problem as have an offsite mirror that was synced 15 days ago. I ran `diff primary_backup mirror_backup` and the supposedly corrupt volumes were identical; as expected, the only differences where the backup volumes generated during the last 15 days. Weird, right? So I run duplicity verify on the primary_target and another volume-deeper in the set-reports a hash mismatch. Weirder, right? Now i run duplicity verify on the mirror: it passes.

Somehow my primary backup got messes up, right? I clobber it and replace it with its older mirror and run duplicity verify to make sure all is good. It fails, the primary_backup fails. I run another `diff primary_backup mirror_backup` and now they are identical. Ok, the drive that stores the primary_backup is failing, but wait: SMART long test says it is perfectly healthy and has no bad blocks or read/write errors in its entire history (I run these once a week). I delete duplicity cache, and temp files and rerun duplicity verify on primary_target and it fails: a new volume with bad hash. Now I go brute force and run duplicity verify 10 times and each time a different volume shows a bad hash. Sometime volume 3, somtimes volume 1000. What?

I now reboot the server--a few days have gone by and I've spent about 5 hrs looking at this--and I run duplicity verify on the primary_backup and it passess. What? This is the exact data that had failed 10 times? Seriously? Nevermind, this is what I expected and how things should be. Just to make sure all is good I rerun duplicity verify on all my sets and wait. What? Now two new sets report errors but the ones that originaly reported errors are fine.

Help! Can we trust verify? Why does it fail sometime and pass other times? Can it be the hard drive even if diff and SMART pass? Is there a bug in duplicity or gpg?

Revision history for this message
Michael Terry (mterry) wrote :

So this is the same issue from old bug 487720. I've seen several complaints since that fix landed about this issue rearing it's head again. It seems we didn't fully fix it back then.

See my comment https://bugs.launchpad.net/duplicity/+bug/487720/comments/54 for some things you can do to recover from this.

Changed in duplicity:
status: New → Confirmed
Revision history for this message
nacs8quad (nacs8quad) wrote :

I am having the same problem, I think. Here is my backup / restore test script:

#!/bin/bash
# Cleanup existing files
rm -rf /mnt/test_backup/*
rm -rf /mnt/test_restore/*

echo "Start time: $(date +%s)"
#Variables
VBOXUSER="vboxuser"
VGROUP="vg"
LVOLUME="vm_images"
SNAPSHOT_MNT="/mnt/snap_$LVOLUME"

#Create the snapshot
lvcreate -L 20G -s -n snap_$LVOLUME /dev/$VGROUP/$LVOLUME

# Make mount point directory
mkdir $SNAPSHOT_MNT

echo "Mounting the snapshot"
mount -o ro /dev/$VGROUP/snap_$LVOLUME $SNAPSHOT_MNT

#echo "Using duplicity to do the backup."
duplicity --no-encryption --no-compression /mnt/snap_$LVOLUME file:///mnt/test_backup/vg_${LVOLUME}

echo "Unmount and remove snapshot mnt point"
umount $SNAPSHOT_MNT
rmdir $SNAPSHOT_MNT

echo "Remove snapshot logical volume"
lvremove -f /dev/$VGROUP/$LVOLUME

echo "Test Restore, Start Time $(date +%s)"
duplicity restore --no-encryption --no-compression file:///mnt/test_backup/vg_vm_images /mnt/test_restore/restored_vg_vm_images

echo "All Complete, Ending Time $(date +%s)"

and output (only have partial output since I was using screen):

Synchronizing remote metadata to local cache...
Deleting local /root/.cache/duplicity/523e1fa62503d77fd52f14d62e0b56ca/duplicity-full-signatures.20161113T065010Z.sigtar.gz (not authoritative at backend).
Deleting local /root/.cache/duplicity/523e1fa62503d77fd52f14d62e0b56ca/duplicity-full.20161113T065010Z.manifest (not authoritative at backend).
Last full backup date: none
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1479020977.87 (Sat Nov 12 23:09:37 2016)
EndTime 1479021871.90 (Sat Nov 12 23:24:31 2016)
ElapsedTime 894.03 (14 minutes 54.03 seconds)
SourceFiles 31
SourceFileSize 37074645133 (34.5 GB)
NewFiles 31
NewFileSize 37074645133 (34.5 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 31
RawDeltaSize 37074591885 (34.5 GB)
TotalDestinationSizeChange 37364254208 (34.8 GB)
Errors 0
-------------------------------------------------

Unmount and remove snapshot mnt point
Remove snapshot logical volume
  Logical volume "snap_vm_images" successfully removed
  Logical volume MekilectVMHost-vg/vm_images contains a filesystem in use.
Test Restore, Start Time 1479021894
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Nov 12 23:09:37 2016
Invalid data - SHA1 hash mismatch for file:
 duplicity-full.20161113T070937Z.vol1161.difftar
 Calculated hash: daea1e7e8dd2496529f36ece54860e2528d9d9ea
 Manifest hash: 39a59fdf5a04ba39caa78f1aed2ee4c49c20d833

All Complete, Ending Time 1479022479

Michael Terry (mterry)
no longer affects: deja-dup
Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Update bug:, status confirmed, importance medium

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

 status confirmed
 importance medium
 done
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEExwf+p6/3mDvUSsaGH1G0ZBEHM/EFAmBmHMQACgkQH1G0ZBEH
M/FDvAf/TPIf+6gVfYIwcsmhxZ43G9CWwIsfrKnAPLjqtH4JYkz1Pb4NwDEAJVhm
T9onBeKAzaaxN0tZl3OR7CHXRfY8SiEiB3M0LmzMImLuAOnI26C2S8ic6lBlJ396
7om0bVG9lmijdMp64D4KbH0aPy6TSjgQYMbNjGzLa5+lno1hqCrYtRnCiFywl37h
aYS2ufQF0xhrw9ulv2wkpTFIHhSEegFBMwObdSwneyVpE/a29jgAsop4fly6EGq1
3l1FrSSEtbbuN+N8f/4L3JJQJ7w1dQor+UnqXvFaFwrJwzl2OvXz7+IpJkwUFUjK
10QNfsgX1kUHBuEkLl5/0hqRc2djnw==
=LTF9
-----END PGP SIGNATURE-----

Changed in duplicity:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.