if we allow the crawler to write files with it's safe filenaming convention, but rename the files on upload, then we could conceivably drop the job prefix and timestamp from warc_filename for something like:
WIDE-ia360913-20101213120000/00395.warc.gz
however, i'm not certain of the risk in creating item members with non-unique filenames.
it turns out that the "jobnode" of a mapped hashcrawlmapped crawler is not easily accessible, so we'll need to use the short hostname instead, i.e.
WIDE-ia360913- 20101213120000/ WIDE-2010121312 0000-00395. warc.gz
if we allow the crawler to write files with it's safe filenaming convention, but rename the files on upload, then we could conceivably drop the job prefix and timestamp from warc_filename for something like:
WIDE-ia360913- 20101213120000/ 00395.warc. gz
however, i'm not certain of the risk in creating item members with non-unique filenames.