nrpe charm xenial incompatibility

Bug #1552638 reported by Tom Haddon
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nrpe (Juju Charms Collection)
Triaged
High
Unassigned

Bug Description

Currently lp:charms/trusty/nrpe delivers checks that will fail on xenial with two issues.

The first is that /usr/lib/nagios/plugins/check_swap on xenial has changed behaviour in xenial so that if the instance has no swap the default check of "-w 90% -c 75%" will fail. A new option has been introduced:

 -n, --no-swap=<ok|warning|critical|unknown>
    Resulting state when there is no swap regardless of thresholds. Default: CRITICAL

So, nrpe needs to detect if we're on xenial and pass "-n ok" in addition to the default "-w 90% -c 75%".

The second issue is that the default number of processes to check for using the "auto" (default) option is too low for xenial. On an instance with the apache2 charm on trusty we see 82 processes, on xenial we see 129. The defaults are:

 proc_thresholds = "-w {} -c {}".format(25 * procs + 100, 50 * procs + 100)

This results in "/usr/lib/nagios/plugins/check_procs -w 125 -c 150" so 129 processes pushes us into warning.

tags: added: kanban-cross-team landscape
Revision history for this message
Stuart Bishop (stub) wrote :

Bug #1599965 is about Python3 (the current charm only works by accident if Python2 is installed by the primary). This bug is about improving the checks.

Changed in nrpe (Juju Charms Collection):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Stuart Bishop (stub)
assignee: Stuart Bishop (stub) → nobody
status: In Progress → Triaged
importance: Critical → High
Revision history for this message
Paul Collins (pjdc) wrote :

check_procs also has a "-k" option to ignore kernel threads that we should consider using. This would leave us less sensitive to kernel changes, although we would also be more exposed to kernel bugs that spawn lots of threads. But that seems like optimizing for the wrong case, and being able to tune more straightforwardly for the actual userspace workload would be an improvement.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.