Comment 10 for bug 435886

Revision history for this message
Robert Collins (lifeless) wrote : Re: Need a way to monitor mailman via nagios

So I think Tom is asking for a functional check that its 'all good'; we don't in e.g. LP actually mutate data.

if mailman cannot tell that its own children are healthy, that seems like a mailman issue we should delegate to mailman.

Doing a full end to end check on a dedicated private list would probably work, and if we add a 'zap list contents' facility it needn't accumulate too much data.

So, I'm going to split this up as follows:
 - its a nagios problem (e.g. not an LP codebase issue) to send a mail to a list, poll for the response back on a known address that nagios gets, and check it shows up in the archive.
 - but we need to write this script; and probably need to have it happen async of the nagios checks - e.g. it writes 'OK' every N minutes to a log file, and the nagios check is then 'is the OK in the log file < N+1 minutes old' where N is the maximum latency we're willing to tolerate for things flowing through mailman.
 - we need to manually create a dedicated private list for this monitoring to happen on, and the address for the response.
 - we need a code change to permit easy nuking of list contents from time to time