nrpe check_disk should ignore /snap mountpoints

Bug #1710239 reported by Drew Freiberger
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
NRPE Charm
Won't Fix
Undecided
Unassigned
monitoring-plugins (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

As we start installing snaps into our environments for things such as prometheus exporters, we're finding that the disk_root configs in the nrpe charm need to be updated to add /snap to the -i ignore list to avoid CRITICAL alerts on root disks even when they are 6% utilized as below:

$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 28G 0 28G 0% /dev
tmpfs 51G 4.1G 47G 9% /run
/dev/sda1 2.0T 104G 1.8T 6% /
tmpfs 252G 4.0K 252G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 252G 0 252G 0% /sys/fs/cgroup
/dev/sde 3.7T 328M 3.7T 1% /srv/node/sde
/dev/sdf 3.7T 327M 3.7T 1% /srv/node/sdf
/dev/sdg 3.7T 325M 3.7T 1% /srv/node/sdg
/dev/bcache2 3.7T 37G 3.7T 1% /srv/ceph/ceph3
/dev/bcache3 3.7T 41G 3.6T 2% /srv/ceph/ceph2
/dev/bcache1 3.7T 41G 3.6T 2% /srv/ceph/ceph1
/dev/bcache0 1.7T 272M 1.7T 1% /srv/nova/instances
cgmfs 100K 0 100K 0% /run/cgmanager/fs
/dev/loop0 81M 81M 0 100% /snap/core/2381
/dev/loop1 5.5M 5.5M 0 100% /snap/prometheus-ceph-exporter/12
/dev/loop2 81M 81M 0 100% /snap/core/2462
tmpfs 51G 0 51G 0% /run/user/1001

Our typical config value for disk_root:
-u GB -w 25% -c 20% -K 5% -A -i '/dev/pts|/run|/sys/fs|udev|/boot/efi|/sys/kernel/debug/tracing'

I'd suggest perhaps that many of these ignores should be part of the check_disk exclusion for %util checks.

The workaround at the moment is adding |/snap to the end of the -i flag on the charm config.

As we expect snaps to become a universal packaging mechanism for many charms as we go forward, we should ensure that our operational tooling understands them and treats them in the proper manner by default.

This may be something to resolve upstream into the monitoring-plugins-basic package itself for the /usr/lib/nagios/plugins/check_disk executable.

Haw Loeung (hloeung)
Changed in nrpe-charm:
status: New → In Progress
assignee: nobody → Haw Loeung (hloeung)
Revision history for this message
Haw Loeung (hloeung) wrote :

I had a look into this and can't see it in the charm itself:

| https://git.launchpad.net/nrpe-charm/tree/config.yaml#n48
| https://git.launchpad.net/nrpe-charm/tree/hooks/nrpe_helpers.py#n346

Is this a custom config overriding the default? Or some other charm used?

Changed in nrpe-charm:
status: In Progress → Incomplete
assignee: Haw Loeung (hloeung) → nobody
Revision history for this message
Drew Freiberger (afreiberger) wrote :

This is a custom config that we set on disk_root.

I think this actually should be filed against monitoring-plugins-basic package, as it's the .deb that provides /usr/lib/nagios/plugins/check_disk.

I know that check_disk ignores things like sysfs and similar...it should also ignore loop mounts (whether snap or iso loops, or otherwise, perhaps unless it's a valid read/write fs type)

Haw Loeung (hloeung)
Changed in nrpe-charm:
status: Incomplete → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in monitoring-plugins (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.