need script(s) to monitor disk space free, pause crawlers
Bug #670598 reported by
siznax
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Archive Widecrawl |
Confirmed
|
Medium
|
siznax |
Bug Description
Heritrix may not do it for us without modification.
here's what i found when my crawl ran out of space:
* many alerts
* could not pause or terminate
100GB job.log on /0
478GB heritrix_out.log on /0 millions of "not canonicalized http://..."
500GB alerts.log on /1
17GB crawl.log on /1
500GB state/ dir on /2
heritrix_out.log, alerts.log or state/ dir may have cause the failure.
i guess we'll need scripts to gaurd against filling disks:
* periodically empty state/*.del files
* monitor disk free, pause when X% full
* monitor log file sizes?
can anyone think of anything else?
summary: |
- need script(s) to monitor disk space free + need script(s) to monitor disk space free, pause crawlers |
Changed in archivewidecrawl: | |
assignee: | nobody → siznax (siznax) |
importance: | Undecided → Medium |
status: | New → Confirmed |
To post a comment you must log in.