nrpe default thresholds are too lax

Bug #1989154 reported by Andrea Ieri
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph Monitor Charm
Fix Committed
Low
Unassigned

Bug Description

The current default tuning of the check_ceph_status nrpe check is quite lax, which makes it hard for that check to go critical.
I think charms should however (in general) default to deploying rather noisy checks, and let the operators tune them down based on their environment instead of the opposite. This is for two reasons:
1. be secure by default / opt _into_ insecurity
2. as an operator, you may never learn a check exists if you don't get alerted by it

Specifically, I would propose the following changes in defaults:

nagios_degraded_thresh: from 1 to 0.1
nagios_misplaced_thresh: from 1 to 0.1
nagios_recovery_rate: from 1 to 100

I think nagios_check_num_osds should remain disabled by default as the check that is being implemented as part of bug 1952985 would effectively supersede it.

Tags: bseng-448
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Picking one-size-fits-all thresholds for these checks is borderline impossible due to widely varying usecases. Having said that agreed that these defaults

nagios_degraded_thresh: from 1 to 0.1
nagios_misplaced_thresh: from 1 to 0.1

are way low for almost any usecase, might as well have no monitoring at all.

A case could be made for lax recovery rate check as many clouds would prefer client traffic over recovery though.

Changed in charm-ceph-mon:
importance: Undecided → Low
status: New → Confirmed
Andrea Ieri (aieri)
tags: added: bseng-448
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-mon (master)
Changed in charm-ceph-mon:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-mon (master)

Reviewed: https://review.opendev.org/c/openstack/charm-ceph-mon/+/860344
Committed: https://opendev.org/openstack/charm-ceph-mon/commit/845111311bcdcebb62c5c82d160abb7b6476d5db
Submitter: "Zuul (22348)"
Branch: master

commit 845111311bcdcebb62c5c82d160abb7b6476d5db
Author: Chi Wai, Chan <email address hidden>
Date: Wed Oct 5 14:56:23 2022 +0800

    Make check_ceph_status.py a bit more "noisy" by default.

    Closes-Bug: #1989154
    Change-Id: Ie0d73f14698e4f3ba4e7231920a622f587b4330f

Changed in charm-ceph-mon:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.