Comment 3 for bug 689994

Revision history for this message
siznax (siznax) wrote :

it turns out that the "jobnode" of a mapped hashcrawlmapped crawler is not easily accessible, so we'll need to use the short hostname instead, i.e.

WIDE-ia360913-20101213120000/WIDE-20101213120000-00395.warc.gz

if we allow the crawler to write files with it's safe filenaming convention, but rename the files on upload, then we could conceivably drop the job prefix and timestamp from warc_filename for something like:

WIDE-ia360913-20101213120000/00395.warc.gz

however, i'm not certain of the risk in creating item members with non-unique filenames.