Compare with old snapshot gives too many files

Bug #1066012 reported by Daniel Betschart
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Back In Time
New
Undecided
Unassigned

Bug Description

After I make a snapshot I run the command to compare with the previously created snapshot (rsync -rtDH --dry-run). But instead of "no files are changed" I receive many many files that have changed. But I think that this could not be. How can I verify this?

I want to backup also system directories as /etc and /var, so I use BiT as root. Backing only my home as normal user this problem does not occur.

My backup device is formatted with XFS. The source directories are a mix of EXT4, ReiserFS and XFS. On most of them I use the acl and user_xattr attributes.

Using Ubuntu Lucid with Backintime 1.0.10~lucid from your PPA.

Revision history for this message
Daniel Betschart (dbet1) wrote :

I forgot to mention that the time for my backup is always about two hours. Even when I do it twice. I think this is because the rsync gives too many files back.

Revision history for this message
Dan (danleweb) wrote :

You can check the log file to see what are the changes it detects (it may be related with acl).

Regards,
Dan

Revision history for this message
Daniel Betschart (dbet1) wrote :

If I make a rsync --dry-run just after a backup is done I receive over 200'000 lines. Some of them follows in this post. The JPEG file was not changed, nor the trash deleted. From the filesystem /var I have made an LVM snapshot and a snapshot is definitively not changed. What the character p does mean?

BACKINTIME: .f...p..... data/some.jpg
BACKINTIME: .d...p....x data/.Trash-1000/expunged/
BACKINTIME: .d...p..... mnt/lvsnapshots/var/backups/
BACKINTIME: .f...p..... mnt/lvsnapshots/var/backups/dpkg.status.0

Revision history for this message
Germar (germar) wrote :

man rsync:
"A p means the permissions are different and are being updated to the sender’s value (requires --perms)."

For compatibility with different filesystems and lot other things BackInTime stores permission of files in an additional file (fileinfo.bz) and change permissions of files and folders . They are owned by the user running the backup and only readable.

If you want to compare the files manually you'd have to add at least '--no-p --no-g --no-o' to your rsync command to avoid detecting this difference. You should take a look at the log which arguments BackInTime use for rsync and adept these to your own dry-run.

Regards
Germar

Revision history for this message
Daniel Betschart (dbet1) wrote :

It is not my own dry-run. It is the dry-run from backintime himself! And this is the cause that it does not detect, that no backup is needed.

Maybe this is also the cause that too many data are backed up each time. I feel that each time a "full" backup is made. I do not change so much data but the backup needs about 2 hours.

I think that Dan should know this.

Revision history for this message
Germar (germar) wrote :

Okay, sorry I misunderstood you.

(1) can you please compare the inodes of a file that definitely hasn't change in two snapshots with 'ls -li <path>' ? The first column is the inode of the file. If BIT correctly created hardlinks for this file the inode must be the same for both files. The third column (if file is not a directory) is a count for how many file-links point on this inode and should be >1.

(2) can you please run test backups (only couple files include) with:
- disabled 'preserve acl' and 'preserve extended attributes' onto your xfs formated device
- again both disabled onto an ext2 (or 3 or 4) formated device
- both enabled onto ext2

Regards,
Germar

Revision history for this message
Daniel Betschart (dbet1) wrote :

The files are correctly linked on the backup device, I have checked the inodes and the link counters with ls -li. Thank you for this input.

I have backed up a small directory. After the first backup backintime detects that there are no changes. Even with ACL's and extended user attributes on. There is no difference between on or off. No difference using XFS, EXT or ReiserFS. This is not the problem.

I have made a LVM-Snapshot of /var, mounted it under /mnt/lvsnapshots and then made a backup of /var/lvsnapshots. Each run of backintime it makes a backup, because of one changed file: [C] hf...p..... mnt/lvsnapshots/var/lib/postgres/dumpall/7.2/postmaster. But in a LVM-Snapshot there are no changes at all, I think. Except I do it manually, which is not the case.

This may be one of the problem, but I think I include too many directories in the backup so that there is no chance that no file were changed after two hours (the duration of the backup). If only one file is changed, backintime makes all his work (cp -aRl, rsync, chmod and the cleanup) and this takes some time. So I think I have to exclude some direcotires eg. /var/spool and so on.

What I don't understand is how backintime include and exclude files and directories. In the given example backing up /mnt/lvsnapshots it makes --include "/mnt/lvsnapshots/" --include "/mnt/" --include "/mnt/lvsnapshots/**" --exclude "*". I think --include "/mnt/lvsnapshots/" should be enough. Can you explain me, why backintime makes it this way?

Revision history for this message
Germar (germar) wrote :

Hi Daniel,
I'm not familiar with LVM-Snapshots so this could be wrong. The 'h' in rsyncs output indicate that the file is a hardlink. I would assume that the LVM-Snapshot does not correctly mirror the data of this hardlink when your running prostgres changes the data on an other hardlink.

IMHO if you backup system root ('/') or other system folders like /var you will not have any chance to catch 'no changes' at all. You should rather make at least two profiles: one for all your important data that will run e.g. every two hours and an other one for system root where you exclude the data. This can run once in the middle of the night. To reduce overall time for this backup you can disable 'Check for changes' for this profile.

You should also think about other backup strategies. BackinTime is nice but there are some cases where it doesn't make sense. Specially when it comes to databases (postgres, mysql ...) you should rather exclude their storage-path and make a (compressed) dump of the entire server. You can use a user-callback script to run this dump every time before the backup starts and include the dump into your profile.
If you just backup the storage-path you will find a corrupt database some day when it comes to the worst.

Sorry, I don't understand the include and exclude strategy neither. But I know that it does work like expected ;-)

Regards,
Germar

Revision history for this message
Daniel Betschart (dbet1) wrote :

I have reduced /var to a few subfolders in it, but the time backintime needs to backup the data is not shorter than before. I will check if it has to do with the LVM snapshot.

Or the problem of the long time to run are the find's to chmod the backuped directories. First u+wx, then a-w on the last backup, and finally a-w on the new backup on any file. Why? Only to set u+wx on the same directory when backintime is run the next time? The same on smart remove: I think it is not necessary to make a chmod u+wx on directories before deletion. A rm -rf deletes it anyway.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.