check_disk plugin broken after upgrade to 15.10

Bug #1516451 reported by Ralf G. R. Bergs on 2015-11-15
126
This bug affects 26 people
Affects Status Importance Assigned to Milestone
nagios-plugins (Ubuntu)
Medium
Unassigned

Bug Description

I didn't touch my Nagios config, just update my system from 15.04 to 15.10. Suddenly the default localhost/Disk Space check fails with the following output:

DISK CRITICAL - /sys/kernel/debug/tracing is not accessible: Permission denied

This can be reproduced when manually running the underlying command as user "nagios":

$ /usr/lib/nagios/plugins/check_disk -w '20%' -c '10%' -e
DISK CRITICAL - /sys/kernel/debug/tracing is not accessible: Permission denied

When I run it as root it works:

# /usr/lib/nagios/plugins/check_disk -w '20%' -c '10%' -e
DISK OK| /dev=0MB;1186;1334;0;1483 /run=8MB;239;269;0;299 /=17157MB;57386;64559;0;71733 /dev/shm=0MB;1199;1349;0;1499 /run/lock=0MB;4;4;0;5 /sys/fs/cgroup=0MB;1199;1349;0;1499 /boot=48MB;181;204;0;227 /run/user/0=0MB;239;269;0;299

Seems "nagios" user can't access the dir it tries to access:

# ls -la /sys/kernel/debug/tracing
drwx------ 7 root root 0 Nov 15 19:40 .

# lsb_release -rd
Description: Ubuntu 15.10
Release: 15.10

# apt-cache policy nagios-plugins-basic
nagios-plugins-basic:
  Installed: 1.5-3ubuntu1
  Candidate: 1.5-3ubuntu1
  Version table:
 *** 1.5-3ubuntu1 0
        500 http://de.archive.ubuntu.com/ubuntu/ wily/main amd64 Packages
        100 /var/lib/dpkg/status

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nagios-plugins (Ubuntu):
status: New → Confirmed
Ben Coleman (oloryn) wrote :

Also note that while 15.10 does change the permissions on /sys/kernel/debug/tracing (from drwxr-xr-x in 15.04 to drwx------ in 15.10), the permissions on /sys/kernel/debug are drwx------ on both 15.10 and 15.04 - which means that /sys/kernel/debug shouldn't be readable from a non-root account on either release, so this looks like a code change in check_disk.

Brian Morton (rokclimb15) wrote :

strace confirms that check_disk on 12.04 doesn't check /sys/kernel/debug/tracing

Not having any luck tracking down a code change in the monitoring-plugins github repo. I wonder if this is a change in a dependent lib instead.

Here's a workaround

sudo chown root:root /usr/lib/nagios/plugins/check_disk
sudo chmod u+s /usr/lib/nagios/plugins/check_disk
sudo chmod o+x /usr/lib/nagios/plugins/check_disk

Brian Morton (rokclimb15) wrote :

I suspect there isn't a code change here, but rather a difference in the way Ubuntu is presenting its mount points. The plugin tries to enumerate and check all mounts. A better use might be to add the actual mount points to be monitored with -p

/usr/lib/nagios/plugins/check_disk -w '20%' -c '10%' -e -p / -p /var -p /boot

12.04:
mount
<snip>
none on /sys/kernel/debug type debugfs (rw)
<snip>

14.04:

debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)

Robie Basak (racb) wrote :

Thank you for taking the time to report and investigate this bug and helping to make Ubuntu better.

It sounds to me like check_disk should have a blacklist of filesystem types to ignore. But explicitly specifying which mount points looks like a suitable workaround.

I wonder if this affects monitoring-plugins in Xenial?

Changed in nagios-plugins (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Gabriele Tozzi (gabriele-tozzi) wrote :

You can use the --exclude-type option to work this bug around:

/usr/lib/nagios/plugins/check_disk -e --exclude-type=tracefs

Same probably for gvfs-fuse filesystems. Recommend

--exclude-type=tracefs --exclude-type=fuse.gvfsd-fuse

Darragh Grealish (grealish) wrote :

This is also broken in ubuntu 16.04, however the workaround mentioned works
/usr/lib/nagios/plugins/check_disk -e --exclude-type=tracefs

TomaszChmielewski (mangoo-wpkg) wrote :

The workaround is not really great when LXD/LXC is in use:

$ /usr/lib/nagios/plugins/check_disk -e --exclude-type=tracefs
DISK CRITICAL - /run/lxcfs/controllers is not accessible: Permission denied

$ /usr/lib/nagios/plugins/check_disk -e --exclude-type=tracefs --exclude-type=cgroup
DISK CRITICAL - /run/lxcfs/controllers is not accessible: Permission denied

$ /usr/lib/nagios/plugins/check_disk -e --exclude-type=tracefs --exclude-type=tmpfs
DISK CRITICAL - /run/lxcfs/controllers/blkio is not accessible: Permission denied

So it only works when we exclude all three above, including tmpfs:

$ /usr/lib/nagios/plugins/check_disk -e --exclude-type=tracefs --exclude-type=cgroup --exclude-type=tmpfs

However, tmpfs is very often used for /tmp, /dev/shm, which are also important to monitor - and --exclude-type=tmpfs makes the check skip these mountpoints.

Nicholas Sherlock (n-sherlock) wrote :

Rather than excluding tmpfs, just exclude /run/lxcfs/controllers. This is the check_all_disks command I'm now using in my /etc/nagios-plugins/config/disk.cfg:

# 'check_all_disks' command definition
define command{
    command_name check_all_disks
    command_line /usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e -A --exclude-type=tracefs --exclude-type=cgroup --exclude_device=/run/lxcfs/controllers
}

Danny Howard (dannyman) wrote :

We normally netboot but we had a few machines that would not PXE, so we installed 14.04 via medium. Afterwards, some of the medium-installed machines were throwing this error in Nagios. Found this line in /etc/mtab on the afflicted hosts:

tracefs /var/lib/ureadahead/debugfs/tracing tracefs rw,relatime 0 0

This appears to be an artifact on the medium-based install process. I removed the above line and ran:

sudo service nagios-nrpe-server restart

Error condition cleared.

Danny Howard (dannyman) wrote :

Possibly related to #499773 which is about install adding spurious entries to mtab.

Marius Gedminas (mgedmin) wrote :

These days /etc/mtab is a symlink to /proc/self/mounts, so you cannot control what is exposed there.

Gerald Combs (gerald.combs) wrote :

This appears to be fixed upstream via https://github.com/Icinga/icinga2/issues/4184

Gerald Combs (gerald.combs) wrote :

Oops - please disregard comment #14 - it's specific to Icinga.

Ian Gibbs (realflash-uk) wrote :

Since check_all_disks is internally defined in Nagios, you might well see a "duplicate definition" error if you define your own check_all_disks command. I'd recommend

define command{
    command_name check_all_physical_disks
    command_line /usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e -A --exclude-type=tracefs --exclude-type=cgroup --exclude_device=/run/lxcfs/controllers
}

instead, and then call that in your host definition:

define service {
 use generic-service
 hostgroup_name all
 service_description Disk Space
 check_command check_all_physical_disks!6%!4%
}

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.