Critical: fix non-convergent scenario

Bug #1247530 reported by Jason Gerard DeRose on 2013-11-03
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Jason Gerard DeRose

Bug Description

I'm still sorting out the details, but I've found a non-convergent scenario, a case where Dmedia wont get all the user files up to 3 copies, even though there is sufficient storage available.

This is what led to the non-convergent scenario:

1) I started with a Dmedia library containing 8 drives but insufficient storage (in my case, there were still several hundred files with only 2 copies due to lack of storage); I believe this scenario can only occur if you have at least 4 drives, but that needs more investigation still

2) I added a new drive, to bring the total storage in the library up to an amount that will allow for 3 copies of all the user files

3) I manually purged a few smaller drives that I needed to re-purpose, although I'm not sure this was necessary to create the non convergent scenario... AFAIK, the important thing is that all the previous drives in the library were full before I added the new drive

I'm now in a situation where Dmedia is stuck with hundreds of files that have only 2 copies, even though there is nearly 3 TB of free space still on my new drive. The key is that there is already a copy of all these fragile files on the new drive, yet the new drive is the only one with any available space. So what needs to happen is some non-fragile files need to be brought up to 4 copies (by creating a copy on the new drive) so that space can be reclaimed on some of the full drives that don't already contain a copy of the fragile files.

I'm working on coming up with a good model of the scenario so we can build unit tests for it, etc.

Ideally we can fix this with a small tweak to the existing copy increasing behavior rather than adding an entirely new special case behavior, but I'm not sure on that yet.

Although this is a very serious bug, from what I understand so far, this bug couldn't itself cause data loss. But it does mean Dmedia would be stuck in a state where there is less statistical breathing room than we want, where Dmedia isn't as resilient as it should be when it comes to metadata being out of sync between devices, to catastrophic hardware failure, and to human error.

Jason Gerard DeRose (jderose) wrote :

So after having a day to think about this, I do feel a bit silly for not catching this sooner, but that's how it goes.

Although surprising at first glance, this scenario is more strait forward than I first thought. However, it does seem like fixing this really needs a 5th behavior. Dmedia has had 4 automatic background behaviors for a long time:

1) Verification - Dmedia makes sure the metadata matches reality, and that the files have perfect file integrity; this includes MetaStore.scan(), MetaStore.relink(), MetaStore.verify_by_downgraded(), MetaStore.verify_by_mtime(), and MetaStore.verify_by_verified()

2) Downgrading - Dmedia automatically lowers its confidence in the aspects of reality it hasn't been able to verify for as certain amount of time; this includes MetaStore.purge_or_downgrade_by_store_atime(), MetaStore.downgrade_by_mtime(), and MetaStore.downgrade_by_verified()

3) Copy increasing - when there are user files with less than 3 copies of durability, Dmedia will create new copies on any FileStore (drives) such that at least MIN_FREE_SPACE will still remain after creating the new copy, by either copying the file from one locally connected drive to one or more other locally connected drives, or by downloading the copy from a peer on the local network; this is performed by the vigilance worker, driven by MetaStore.iter_actionable_fragile()

4) Copy decreasing - when there is a locally connected drive with less than RECLAIM_BYTES free space available, and when there are copies on that drive such that after deleting that copy, the file will still have a durability of 3, those copies are automatically deleted (reclaimed) on that drive, starting with the least recently used file (base on doc.atime)

I'd describe the 5th behavior that I think is probably the best fix to this problem something like this:

5) Shuffling - when a drive in the library has less than RECLAIM_BYTES available and contains files with a durability of 3, and when there is a locally connected drive upon which at least MIN_FREE_SPACE would remain after creating a 4th copy of a file, Dmedia will create new copies of these files by copying them from locally connected drives or downloading them from peers on the local network; this behavior creates the needed scenario under which behavior (4) can reclaim space on the first drive

David Jordan brought up the very good point that we need to consider how file "pinning" effects this. Currently, files are never reclaimed when they're pinned. David also suggested that pinning be a convergence behavior... that pinning doesn't necessarily reflect the current state Dmedia is in, it reflects the state Dmedia should be moving toward, and if needed for data safety reasons, Dmedia might temporarily ignore the user's pinning requests.

Still more thought/experimentation/testing needed on this, but I think we're making progress.

David Jordan (dmj726) wrote :

Yes, "pinning" files should be a convergence behavior. Otherwise there may be scenarios in which "pinning" presents an obstacle to data safety.
With "pinning" as a convergence behavior, we get a few advantages:
    1) the Shuffling behavior works properly
    2) we view pinning as something to be achieved over a period of time, resuming after reboots etc.

Transferring pinned files should be done as quickly as I/O allows. Users will expect "pinning" to work as quickly as copying files to another drive usually takes. It might still be a long-running operation involving the transfer of terabytes of assets, which is why viewing it as a convergent, continuing process is a good idea.

Something that *might* be possible is Shuffling may interfere with "pinning" in essence:
    Should we prioritize a specific level of data safety over data usefulness?
If the user needs to "pin" a project-full of data to a drive for transfer, and shuffling interferes with the "pinning" process, this could stop a partner from getting the assets needed to begin working. I'm not sure such a scenario is possible. It could be prevented by marking such drives as "transient" to indicate they shouldn't be used for shuffling and "pinning" takes precedence.

In any case, the user should be kept apprised of the status of all incomplete "pinning" processes and informed when the "pinning" process has completed for a specified group of files. The dmedia indicator could list incomplete(due to Shuffling) and in-progress "pinning" groups. And a notification should announce when a "pinning" group has completed. Likewise a notification should appear when a "pinned" group becomes incomplete due to shuffling, so the user doesn't mistakenly send an incomplete set of files.

Jason Gerard DeRose (jderose) wrote :
Download full text (4.2 KiB)

A progress update on this: to help me clarify some issues, I've started with some refactoring of the current copy-increasing code. There was still a lot of stuff that was difficult to unit test (and sometimes lacked unit tests), so I've been breaking things down into simpler units of functionality.

I think I'd like to attack this bug from more than one angle, to eventually have more than one behavior that can address this scenario. For example, the current downgrading behavior has two independent avenues by which files will be downgraded. The quicker behavior is based on the store atime, but if that happened to fail for whatever reason, files would eventually be downgraded based on their verification timestamp. Of course, each of these mechanisms should work in a vacuum, should never count on another mechanism taking up the slack.

One thing Dmedia doesn't do yet is speculatively create new copies based on recent access patterns. For example, if you've been working with a certain set of files often on your laptop, and those files don't yet exist on your workstation, it would be reasonable for Dmedia to create copies of these files on your workstation even before you try to use them on the workstation (which would trigger the on-demand download from your laptop).

As Dmedia can quickly respond to low drive space events (even when space is being used up by applications other than Dmedia), it's entirely reasonable for Dmedia to keep your drives nearly full at all times, as this provides better availability between devices, and means Dmedia has the wiggle room needed to do any needed shuffling.

So I'm thinking of this in terms of two complementary actions, one that would be provided as a tweak to the current copy-increasing behavior, the other that would be implemented in our new 5th "shuffling" behavior:

1) preemptively create a 4th copy based on access patterns, assuming space is available

2) reactively create a 4th copy when the very specific scenario in this bug is detected

I'm strongly leaning toward doing (1) first, because I always prefer to improve/fix existing functionality before adding new functionality.

I'm also a bit leery about adding a new behavior without a lot of thought and care. The current 4 behaviors were a *long* time in the making, and they're very fool proof because they're driven by the data model and the CouchDB view functions. It's a design that has proven extremely robust even in the face of multiple peers all simultaneous updating the metadata and creating untold conflicts. Not only that, but these 4 behaviors are ongoing simultaneously even on a single node. So it wasn't an easy feet to get them all interacting well together, to make sure the overall Dmedia behavior always moves in the correct direction over time.

This 5th behavior also crosses some data boundaries in a way the others don't (because it must consider the available free space on multiple stores, plus the files present on each). So it's not easy to build this 5th behavior using the same tried and true patterns of the other 4. This 5th behavior is important, and I'm certain we need to add, but it's complex new territory and should be regarded with...


Changed in dmedia:
milestone: 13.11 → 13.12
Changed in dmedia:
milestone: 13.12 → 14.01
Changed in dmedia:
milestone: 14.01 → 14.02
Changed in dmedia:
milestone: 14.02 → 14.03
Changed in dmedia:
milestone: 14.03 → 14.04
Changed in dmedia:
milestone: 14.04 → 14.05
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers