check_all_disks includes squashfs /snap/* which are 100%
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Nagios Charm |
Fix Released
|
Undecided
|
Unassigned | ||
coreutils (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
monitoring-plugins (Ubuntu) |
Fix Released
|
Low
|
Bryce Harrington |
Bug Description
[Impact]
False positive reports are generated in monitoring tools when artificial filesystems are mounted, since they show 100% disk utilization, and thus add unnecessary (but dire sounding) "DISK CRITICAL" noise.
[Test Case]
$ lxc create ubuntu-
$ lxc exec lp1827159 bash
# apt-get -y update
# apt-get install monitoring-plugins
# snap install gnome-calculator
[...]
# /usr/lib/
DISK CRITICAL - free space: / 1903 MB (1% inode=78%); /dev 0 MB (100% inode=99%); /dev/full 16018 MB (100% inode=99%); /dev/null 16018 MB (100% inode=99%); /dev/random 16018 MB (100% inode=99%); /dev/tty 16018 MB (100% inode=99%); /dev/urandom 16018 MB (100% inode=99%); /dev/zero 16018 MB (100% inode=99%); /dev/fuse 16018 MB (100% inode=99%); /dev/net/tun 16018 MB (100% inode=99%); /dev/lxd 0 MB (100% inode=99%); /dev/.lxd-mounts 0 MB (100% inode=99%); /dev/shm 16041 MB (100% inode=99%); /run 3208 MB (99% inode=99%); /run/lock 5 MB (100% inode=99%); /sys/fs/cgroup 16041 MB (100% inode=99%); /snap 1903 MB (1% inode=78%); /run/snapd/ns 3208 MB (99% inode=99%);| /=111171MB;
# /usr/lib/
DISK CRITICAL - free space: /dev 0 MB (100% inode=99%); /dev/lxd 0 MB (100% inode=99%); /dev/.lxd-mounts 0 MB (100% inode=99%); /run/lock 5 MB (100% inode=99%);| /=111392MB;
# /usr/lib/
DISK OK| /=111171MB;
[Regression Potential]
As this alters the logic of how out-of-space checks are handled, relevant issues to keep an eye out for would relate to filesystem checks reporting improperly. These tools underlay a few different front-ends, so regression bugs may get filed in a few different places, however they will tend to display error messages involving check_disk, nagios, and either tmpfs or tracefs.
Note that there are likely other synthetic filesystems beyond tmpfs and tracefs (e.g. udev, usbfs, devtmpfs, fuse.*, ...) which might also cause similar false positives; these should be handled as separate bugs, although they can likely be fixed the same way.
[Fix]
monitoring-plugins is modified to exclude the unwanted filesystems by default, in check_disk.c (see patch).
[Discussion]
There have been several bug reports filed about false positives with different synthetic file systems (see Dupes), including tracefs, squashfs, and tmpfs. The commonly discussed workaround is to exclude these when running the tools (e.g. using the '-X <fs>' parameter for check_all_disks). Since wrappers are typically used for running the underlying tools, it is possible to add a string of -X... parameters.
However, a cleaner solution is possible. monitoring-plugins' check_disk.c maintains an internal exclusion list, fs_exclude_list, which already excludes iso9660, and can be modified to add other filesystems to exclude by default.
In other words, check_disk.c is modified thusly:
np_add_
np_add_
np_add_
np_add_
This code is added prior to the command line parsing logic, and as such simply sets default behavior. It does not preclude further adding or removing filesystems via the -X and -N parameters. Indeed, if someone were to desire checking tmpfs, they are able to manually add it, via "-N tmpfs".
[Original Report]
When using nagios to monitor the Nagios host itself, if the host is not a container, the template for checking the disk space on the Nagios host does not exclude any snap filesystems. This means we get a Critical report if any snap is installed.
This can be changed by adding to the check_all_disks command a '-X squashfs', but that command is defined in the nagios plugins package.
(Or, perhaps '-X tmpfs'? -- bryce)
Related branches
- Utkarsh Gupta (community): Needs Information
- Bryce Harrington (community): Approve
-
Diff: 89 lines (+47/-13)2 files modifieddebian/changelog (+9/-0)
debian/patches/exclude-tmpfs-squashfs-tracefs.patch (+38/-13)
- Christian Ehrhardt (community): Needs Fixing
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 78 lines (+44/-1)4 files modifieddebian/changelog (+9/-0)
debian/control (+2/-1)
debian/patches/exclude-tmpfs-squashfs-tracefs.patch (+30/-0)
debian/patches/series (+3/-0)
- James Hebden (community): Approve
-
Diff: 27 lines (+9/-1)1 file modifiedhooks/templates/localhost_nagios2.cfg.tmpl (+9/-1)
- James Hebden (community): Needs Fixing
-
Diff: 27 lines (+9/-1)1 file modifiedhooks/templates/localhost_nagios2.cfg.tmpl (+9/-1)
description: | updated |
description: | updated |
tags: | added: patch |
Changed in monitoring-plugins (Ubuntu): | |
assignee: | nobody → Bryce Harrington (bryce) |
description: | updated |
description: | updated |
Changed in coreutils (Ubuntu): | |
status: | New → Fix Committed |
Changed in monitoring-plugins (Ubuntu Bionic): | |
assignee: | nobody → Hua Zhang (zhhuabj) |
importance: | Undecided → Medium |
tags: | added: sts sts-sponsor-dgadomski |
Changed in coreutils (Ubuntu Bionic): | |
status: | New → Invalid |
no longer affects: | coreutils (Ubuntu Bionic) |
no longer affects: | monitoring-plugins (Ubuntu Bionic) |
This has been fixed in nagios-32 by deploying a custom check_all_ disks_no_ squashfs check