upload or derive crawl logs
Bug #661524 reported by
siznax
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Archive Widecrawl |
Confirmed
|
Medium
|
Unassigned |
Bug Description
as discussed...
1) upload crawl logs with draintasker - have draintasker look for timestamped crawl log(s) on each pass, then create a manifest of warc series (item identifiers) that correspond to that log, then upload the log and manifest into a new item.
2) derive crawl log from warcs - on warc series derive, write an equivalent crawl log on from warc content.
please discuss in comments.
To post a comment you must log in.
i guess the first step is to determine if warcs currently contain enough information to (sufficiently) reproduce a crawl log.
if so, then we can just "rederive" already uploaded warc series.
if not, then we'll need to upload existing crawl logs as described in (1) above.