warc series + warc file names too long
Bug #689994 reported by
siznax
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Archive Widecrawl |
Fix Committed
|
Medium
|
siznax |
Bug Description
in the interest of CDX generation in the deriver, shorten the warc series name (item identifier) to "job-segment-date", and the warc file name template to "${prefix}
so, instead of:
WIDE-2010121323
more like:
WIDE-5-
To post a comment you must log in.
will probably need to add time to the date in order to avoid filename collisions when restarting a crawler, e.g.
WIDE-5- 20101213120000/ WIDE-00395. warc.gz