need script(s) to monitor disk space free, pause crawlers

Bug #670598 reported by siznax
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Archive Widecrawl
Confirmed
Medium
siznax

Bug Description

Heritrix may not do it for us without modification.

here's what i found when my crawl ran out of space:

 * many alerts
 * could not pause or terminate

  100GB job.log on /0
  478GB heritrix_out.log on /0 millions of "not canonicalized http://..."
  500GB alerts.log on /1
   17GB crawl.log on /1
  500GB state/ dir on /2

heritrix_out.log, alerts.log or state/ dir may have cause the failure.

i guess we'll need scripts to gaurd against filling disks:

* periodically empty state/*.del files
* monitor disk free, pause when X% full
* monitor log file sizes?

can anyone think of anything else?

siznax (siznax)
summary: - need script(s) to monitor disk space free
+ need script(s) to monitor disk space free, pause crawlers
siznax (siznax)
Changed in archivewidecrawl:
assignee: nobody → siznax (siznax)
importance: Undecided → Medium
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.