Juju nrpe charm sets load alert threshold values incorrectly

Bug #1777764 reported by Mian
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
NRPE Charm
Won't Fix
Wishlist
Unassigned

Bug Description

There are 2 defects with Juju nrpe charm

1. Config's description of load was not updated to date and is misleading when set to auto
Definition per [1]:
if 'auto' is set, then NUM_CPUS*0.7 is used for the warning threshold and NUM_CPUS for critical
Actuality per [2]&testing:
if 'auto' is set, then load average (1 minute,5 minutes,15 minutes respectively)
(NUM_CPUS*4, NUM_CPUS*2, NUM_CPUS*1) is used for the warning threshold
(NUM_CPUS*8, NUM_CPUS*4, NUM_CPUS*2) is used for the critical threshold

2. If auto is not favourable in the cloud, hard-coded setting is rigid and inflexible
As the customer complained
"The other problem is that if I want to override these numbers in the juju config I can only specify it in the format "-w 8,8,8 -c 15,15,15" and this does not take into account numcpu. Since some units have 88 cores (e.g. nova-compute-kvm) and other have 2 (e.g. kibana) it makes it impossible to set one load number that is appropriate for all systems."

How to reproduce

1. Deploy nrpe and relevant relationships with Juju
$ juju deploy ubuntu
$ juju deploy nrpe
$ juju deploy nagios
$ juju add-relation ubuntu nrpe
$ juju add-relation nrpe:monitors nagios:monitors
$ juju config nrpe load=auto

2. reboot ubuntu node by allocating it with 1/2/3 CPU cores, and check the check_load configuration

First boot with 1 CPU core
root@6nova:~# more /etc/nagios/nrpe.d/check_load.cfg ;cat /proc/cpuinfo | grep processor
# System Load (sub)
command[check_load]=/usr/lib/nagios/plugins/check_load -w 4,2,1 -c 8,4,2
processor : 0

Second boot with 2 CPU core2
root@6nova:~# more /etc/nagios/nrpe.d/check_load.cfg ;cat /proc/cpuinfo | grep processor
# System Load (sub)
command[check_load]=/usr/lib/nagios/plugins/check_load -w 8,4,2 -c 16,8,4
processor : 0
processor : 1
root@6nova:~#

Third boot with 3 CPU cores
root@6nova:~# more /etc/nagios/nrpe.d/check_load.cfg ;cat /proc/cpuinfo | grep processor
# System Load (sub)
command[check_load]=/usr/lib/nagios/plugins/check_load -w 12,6,3 -c 24,12,6
processor : 0
processor : 1
processor : 2
root@6nova:~#

3. hard-coded setting, below setting will be refreshed to all nodes without consideration of number of CPU cores
$ juju config nrpe load="-w 1.0,0.80,0.70 -c 5,2,1"

[1] https://jujucharms.com/nrpe/49
[2] https://git.launchpad.net/nrpe-charm/commit/?id=009a1ae855fe9e4d648ac800b5e7d78a4efb00a6

Related branches

John A Meinel (jameinel)
affects: juju → nrpe-charm
Shane Peters (shaner)
Changed in nrpe-charm:
assignee: nobody → Shane Peters (shaner)
Changed in nrpe-charm:
assignee: Shane Peters (shaner) → Aurelien Lourot (aurelien-lourot)
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

1. Documentation of 'auto' has been fixed meanwhile.
2. Complaint about rigidity of the 'load' config is still valid.

Changed in nrpe-charm:
importance: Undecided → Wishlist
status: New → Triaged
assignee: Aurelien Lourot (aurelien-lourot) → nobody
Revision history for this message
Brett Milford (brettmilford) wrote :

Check load has the parameter:
 -r, --percpu
    Divide the load averages by the number of CPUs (when possible)

Thus in the case of 2) one may still set thresholds a a proportion of the core count i.e.
'-r -w 0.9,0.8,0.7 -c 1.5,1.2,1'

Eric Chen (eric-chen)
Changed in charm-nrpe:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.