Add location record, add locations and priority to file record
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Dmedia |
Fix Released
|
High
|
Jason Gerard DeRose |
Bug Description
The meta data for an entire dmedia library is always available locally in CouchDB, but specific media files may not be. Tracking the locations where a given file is stored is probably the most important thing dmedia does. This information will be used to drive three types of dmedia background tasks:
1) Downloading - files that aren't available locally are downloaded from other locations
2) Uploading - newly imported files are uploaded to other locations
3) Reclaiming - files with sufficient durability (enough copies stored in other locations) or low priority (eg, proxy file that can be rendered again) can be deleted locally to free space
These operations can be explicitly requested, but the goal is also to have them happen automatically based on a prediction algorithm (think branch prediction). The prediction algorithm might take a long time to refine enough to be useful, but having it work well (or even at all) right now isn't a priority. However, having the schema in place is important. There will also be some automatic actions based on hard criteria (rather than fuzzy "this is probably what the user wants" prediction)... chiefly that when there are original, user-generated files with low durability (only one copy in the world), dmedia will automatically try to upload them to other locations.
First up, we need a new "location" record type. This record will be used to represent both native dmedia storage and storage on various 3rd-party services (S3, UbuntuOne, whatever). There will be a small amount of common schema in the location record, plus other information specific to the type of location. The common schema will look something this:
{
"_id": LOCATION_UUID,
"record_type": "http://
"added": time.time(),
"durability": 2, # The default durability of this location (per file may differ)
"plugin": NAME_OF_
}
"plugin" will be used to, you guessed it, find the correct dmedia location plugin that knows how to deal with this particular backend. There are 3 plugins that I want right off the bat: dmedia (for native dmedia stored on the permanent file system), dmedia_removable (for native dmedia stored on a removable drive), and s3 (for storing on Amazon S3).
In addition to the common schema above, a dmedia location record might have attributes like this:
{
"plugin": "dmedia",
"machine": MACHINE_UUID,
"path": "/home/
}
Where MACHINE_UUID is a unique ID for a given computer. It should be the same one used in the import records (see lp:680379) and probably the same used by desktopcouch.
Okay, second we need a "locations" attribute in each file record. It will be a dictionary, something like this:
{
"_id": CONTENT_HASH,
"record_type": "http://
"locations" {
"location_
"added": time.time(),
"durability": 1,
},
"location_
"added": time.time(),
"durability": 2,
"policy": "'public-read",
}
}
}
So the key in the "locations" dict is the LOCATION_UUID of the location record. The value is itself a dict (to make it easily extensible). It should aways have "added" (timestamp of when this file was added to this location) and "durability". Different locations might required other information in the dict in order to retrieve the file (for example, a key assigned by the storage service). On S3 it would be handy to track whether the file is publicly readable (eg my "policy": "public-read" above).
And the final thing we need is a "priority" attribute in the file record. In gross terms, all that matters is whether the priority is "original", which means this is user generated content that cannot be replaced if lost... dmedia will strive to always maintain a (configurable) minimum durability for original files. Any other priority means the file is replaceable one way or another, that the file is basically fair game for reclamation without regard to durability. However, these non-original priorities wont necessarily be treated the same, and this is an area where a smart prediction algorithm would be super sweet.
So the file record with "priority" might look something like this:
{
"_id": CONTENT_HASH,
"record_type": "http://
"locations" {...},
"priority": "original",
}
Some priorities that I think will be useful:
original - don't dare loose these files
downloaded - got in from the Internet, can download it again
paid - can be replaced, but I'd have to pay for it (eg iTunes)
proxy - low res version of an original file, can be transcoded again
cache - temporary files generated for performance reasons (say a pre-render of a node in the Novacut editing graph)
render - if needed original media files and edit description are still available, can be re-rendered (like for Novacut)
There are some priorities (like cache) that don't make sense to track throughout the library. On the other hand, the locations of proxy files is pretty important, especially for Novacut.
This is probably the most critical point of the dmedia design, so I'd love feedback/guidance from stakeholders and anyone who thinks this sounds cool.
Changed in dmedia: | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Jason Gerard DeRose (jderose) |
milestone: | none → 0.2 |
Changed in dmedia: | |
importance: | High → Critical |
Changed in dmedia: | |
milestone: | 0.2 → 0.3 |
Changed in dmedia: | |
milestone: | 0.3 → 0.4 |
Changed in dmedia: | |
assignee: | Jason Gerard DeRose (jderose) → nobody |
importance: | Critical → High |
Changed in dmedia: | |
assignee: | nobody → Jason Gerard DeRose (jderose) |
status: | Triaged → In Progress |
Changed in dmedia: | |
status: | In Progress → Fix Committed |
Changed in dmedia: | |
status: | Fix Committed → Fix Released |
Quick progress report... I've been documenting the rationale for some of the core schema design here:
http:// bazaar. launchpad. net/~jderose/ dmedia/ stores/ view/head: /dmedia/ schema. py
This provides some much needed documentation, but also gets the ideas clearer in my own head so I'm better be pared to tackle what is probably the most critical part of the dmedia schema.
I decided that "store" is a bit better/clearer than "location"... so what I call a "location" above in the bug report is what I call a "store" in schema.py.