mhddhs occasionally creates duplies of files copied to virtual drive

Bug #403524 reported by Christopher Peplin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
mhddfs (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: mhddfs

Package: mhddfs
Architecture: amd64
Version: 0.1.18-1
Ubuntu version: Jaunty

I have been using mhddfs for the past two Ubuntu releases, and just now I am noticing that occasionally the used disk space reported by "df" is higher than the total disk usage calculated with "durep" or "du". After some investigation, I found that some files were duplicated on multiple hard drives in the virtual array.

My fstab:
mhddfs#/media/disk1,/media/disk2,/media/disk3,/media/disk4 /media/virtual fuse defaults,allow_other 0 0

When I notice the free disk space mismatch, I used a Perl script to search for files with the same name. I would find things like this:

peplin@line:~/bin$ ls -lh /media/disk2/Pictures/Highway\ Systems\ Book/20090613-123706.dng
-rwxrwx--- 1 root plugdev 20M 2009-07-06 23:21 /media/disk2/Pictures/Highway Systems Book/20090613-123706.dng
peplin@line:~/bin$ ls -lh /media/disk3/Pictures/Highway\ Systems\ Book/20090613-123706.dng
-rwxrwx--- 1 root plugdev 20M 2009-07-07 22:41 /media/disk3/Pictures/Highway Systems Book/20090613-123706.dng
peplin@line:~/bin$ ls -lh /media/virtual/Pictures/Highway\ Systems\ Book/20090613-123706.dng
-rwxrwx--- 1 root plugdev 20M 2009-07-06 23:21 /media/virtual/Pictures/Highway Systems Book/20090613-123706.dng

Notice that there are two different copies of the file on disk2 and disk3, and the virtual drive is masking the newer version. These files are being copied to the /media/virtual drive via ssh and using rdiff-backup. That shouldn't make a difference, because I believe it's just copying to /media/virtual like normal and it has no knowledge of there being a disk1, disk2, etc.

It looks like when the new file is copied to the drive, mhddfs occasionally doesn't realize that this file already exists and so it copies a new version to another drive.

Revision history for this message
Tomas Pospisek (tpo-deb) wrote :

Might be the same thing as this bug in Debian:

http://bugs.debian.org/580785

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mhddfs (Ubuntu):
status: New → Confirmed
Revision history for this message
Ryan Thorburn (ryan-thorburn) wrote :

I also have seen duplicate files being created but in my case I believe this happened when one of the underlying drives dropped out and became unresponsive which contain the first copy of the file and my rsync continues copying to the virtual mount point.

Perhaps mhddfs needs to be more aware of the underlying disks and ensure that it can query them correctly and if not turn the virtual file system into read only?

I would be very interested in getting a copy of your Perl script to find all of the duplicates within my drives.

Revision history for this message
Christopher Peplin (chris.peplin) wrote :

Miraculously I still have that script from 2009 although I can't say I've used it since then! https://gist.github.com/peplin/4f2570e046b5bc144352

It's pretty simple, it just looks for duplicate filenames. There's probably something better out there than will double-check if the contents of the files matches. You could add an MD5 or SHA comparison to this script farily easily if you wanted it.

Revision history for this message
Ryan Thorburn (ryan-thorburn) wrote :

Thank you for the link. I'll run this over my repository tomorrow.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.