nagios check's methodology is suboptimal (indexer failures)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Graylog Charm |
Fix Released
|
Medium
|
Unassigned |
Bug Description
When there are failed ingestions less than 24 hours old, the Nagios check returns critical. This means that the only options for dealing with it after the problem is resolved (assuming the problem wasn't transient) are to a) downtime/ignore the alert for 24h-event_age or b) remove the errors from the logging collection.
The second option is complicated by the fact that the mongodb collection is a capped collection, which does not support deleting documents, leaving the only option being to delete and recreate it (assuming graylog doesn't recreate on demand; does it?).
The nagios check could help a little by supporting a method of telling it that events older than a certain age have been dealt with. This could be as simple as having the operator touch a file on disk, which the check could win bonus points by mentioning in the check output, but there may be other options I haven't considered.
Related branches
- Xav Paice (community): Approve
- Jeremy Lounder: Pending requested
- Canonical IS Reviewers: Pending requested
-
Diff: 114 lines (+35/-2)5 files modifiedactions.yaml (+4/-0)
actions/actions.py (+10/-0)
actions/ignore-indexer-failures (+1/-0)
files/check_graylog_health.py (+16/-2)
lib/charms/layer/graylog/api.py (+4/-0)
- Tom Haddon: Approve
- Canonical IS Reviewers: Pending requested
- Junien F: Pending requested
- Canonical IS Reviewers: Pending requested
-
Diff: 21 lines (+2/-1)1 file modifiedfiles/check_graylog_health.py (+2/-1)
- Joe Guo (community): Approve
- Junien F: Disapprove
- Tom Haddon: Approve
- Canonical IS Reviewers: Pending requested
-
Diff: 22 lines (+3/-1)1 file modifiedfiles/check_graylog_health.py (+3/-1)
tags: | added: canonical-bootstack |
summary: |
- nagios check's methodology is suboptimal + nagios check's methodology is suboptimal (indexer failures) |
Changed in graylog-charm: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in graylog-charm: | |
status: | Triaged → Fix Released |
Changed in charm-graylog: | |
status: | Triaged → Fix Released |
This will be fixed by https:/ /code.launchpad .net/~woutervb/ graylog- charm/+ git/graylog- charm/+ merge/367556