Internet Archive - Tech Support

Bug #1049106
Comment #1

Comment 1 for bug 1049106

Revision history for this message

Hank Bromley (hank-archive) wrote on 2012-09-11:

In the ordinary case, we extract scandate, scanner, operator (and, when present, missingpages, foldout-operator, and republisher) from the scandata file at the beignning of the derive. Sheetfed items have no scandata file initially - although a bare-bones one is created for them during the derive - so there's no place to extract those metadata from. It's true that some of that info is visible in the task history, but most is not, and in any case we have no mechanism for finding such metadata in the task history and using it.

I don't remember whether we discussed these metadata when the sheetfed process was set up, but if we do need to have scandate, etc., we'll need to establish some mechanism for uploading and extracting that info.

Two ideas occur to me right off: (1) the upload script could add a next_cmd arg for a modify_xml.php task that would add the desired info to meta.xml, and (2) we could make use of the metadata derivation pathway - you'd upload an additional simple file containing the element names and values, and we'd add those to meta.xml during the derive.