nagios check's methodology is suboptimal (indexer failures)

Bug #1805043 reported by Paul Collins
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Graylog Charm
Fix Released
Medium
Unassigned

Bug Description

When there are failed ingestions less than 24 hours old, the Nagios check returns critical. This means that the only options for dealing with it after the problem is resolved (assuming the problem wasn't transient) are to a) downtime/ignore the alert for 24h-event_age or b) remove the errors from the logging collection.

The second option is complicated by the fact that the mongodb collection is a capped collection, which does not support deleting documents, leaving the only option being to delete and recreate it (assuming graylog doesn't recreate on demand; does it?).

The nagios check could help a little by supporting a method of telling it that events older than a certain age have been dealt with. This could be as simple as having the operator touch a file on disk, which the check could win bonus points by mentioning in the check output, but there may be other options I haven't considered.

Related branches

Alvaro Uria (aluria)
tags: added: canonical-bootstack
Haw Loeung (hloeung)
summary: - nagios check's methodology is suboptimal
+ nagios check's methodology is suboptimal (indexer failures)
Haw Loeung (hloeung)
Changed in graylog-charm:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Wouter van Bommel (woutervb) wrote :
Changed in graylog-charm:
status: Triaged → Fix Released
Revision history for this message
Haw Loeung (hloeung) wrote :

Linked MPs are still waiting on review?

Changed in graylog-charm:
status: Fix Released → Triaged
Changed in charm-graylog:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.