validate warc contents (warcvalidator) before upload

Bug #661520 reported by siznax
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Archive Widecrawl
Confirmed
Medium
siznax

Bug Description

consider using Hanzo tools "warcvalidator" to validate warc contents before uploading.

set aside warc ("warc.gz.invalid") if warcvalidator fails.

see http://code.google.com/p/warc-tools/

Tags: drain warc
siznax (siznax)
Changed in archivewidecrawl:
status: New → Confirmed
importance: Undecided → Low
siznax (siznax)
Changed in archivewidecrawl:
importance: Low → Medium
assignee: nobody → siznax (siznax)
Revision history for this message
siznax (siznax) wrote :

on second thought, may want to instead use Heritrix's WARCReader

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.