Debian: collectd disk usage check misses some filesystems

Bug #1979367 reported by Gerry Kopec
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Gerry Kopec

Bug Description

Brief Description
-----------------

While testing collectd on debian build, noticed that collectd filesystem check missed a number of filesystems because:

/var and /boot/efi are new filesystems in debian systems
scratch, etcd, platform, extension & backups have different paths in debian vs. centos
Without these checks, filesystems could become full without any alarm being generated.

Severity
--------
Minor

Steps to Reproduce
------------------
Boot debian build and observe collectd logs.
Add dummy files to these filesystems so that usage exceeds 80/90%

Expected Behavior
-----------------
Alarms should be raised (major at >80%, critical at > 90%)

Actual Behavior
---------------
No alarms are seen

Reproducibility
---------------
First test on debian

System Configuration
--------------------
One node system - debian

Branch/Pull Time/Commit
-----------------------
Jun. 10 build

Last Pass
---------
First attempt. Works on centos

Timestamp/Logs
--------------
centos
2022-06-13T23:17:48.708 controller-0 collectd[340323]: info alarm notifier reading: 0.01 % usage - /tmp
2022-06-13T23:17:48.708 controller-0 collectd[340323]: info alarm notifier reading: 0.51 % usage - /var/lib/ceph/mon
2022-06-13T23:17:48.708 controller-0 collectd[340323]: info alarm notifier reading: 0.50 % usage - /var/lib/kubelet
2022-06-13T23:17:48.708 controller-0 collectd[340323]: info alarm notifier reading: 0.33 % usage - /scratch
2022-06-13T23:17:48.708 controller-0 collectd[340323]: info alarm notifier reading: 0.18 % usage - /opt/backups
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 28.06 % usage - /boot
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 1.12 % usage - /var/log
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 62.32 % usage - /var/lib/docker
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 0.01 % usage - /var/lib/nova/instances
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 16.79 % usage - /var/lib/docker-distribution
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 3.29 % usage - /opt/etcd
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 0.40 % usage - /opt/platform
2022-06-13T23:17:48.709 controller-0 collectd[340323]: info alarm notifier reading: 0.25 % usage - /opt/extension
2022-06-13T23:17:48.710 controller-0 collectd[340323]: info alarm notifier reading: 0.33 % usage - /var/lib/rabbitmq
2022-06-13T23:17:48.710 controller-0 collectd[340323]: info alarm notifier reading: 0.69 % usage - /var/lib/postgresql
2022-06-13T23:18:17.112 controller-0 collectd[340323]: info alarm notifier reading: 0.00 % usage - /dev
2022-06-13T23:18:17.113 controller-0 collectd[340323]: info alarm notifier reading: 0.00 % usage - /dev/shm
2022-06-13T23:18:17.113 controller-0 collectd[340323]: info alarm notifier reading: 53.91 % usage - /

debian
2022-06-15T03:56:08.723 controller-0 collectd[2848373]: info alarm notifier reading: 0.01 % usage - /dev/shm
2022-06-15T03:56:08.725 controller-0 collectd[2848373]: info alarm notifier reading: 0.01 % usage - /tmp
2022-06-15T03:56:08.725 controller-0 collectd[2848373]: info alarm notifier reading: 0.31 % usage - /var/lib/ceph/mon
2022-06-15T03:56:08.726 controller-0 collectd[2848373]: info alarm notifier reading: 0.02 % usage - /var/lib/kubelet
2022-06-15T03:56:08.726 controller-0 collectd[2848373]: info alarm notifier reading: 2.75 % usage - /var/log
2022-06-15T03:56:08.727 controller-0 collectd[2848373]: info alarm notifier reading: 26.43 % usage - /var/lib/docker
2022-06-15T03:56:08.728 controller-0 collectd[2848373]: info alarm notifier reading: 15.12 % usage - /var/lib/rabbitmq
2022-06-15T03:56:08.729 controller-0 collectd[2848373]: info alarm notifier reading: 10.44 % usage - /var/lib/docker-distribution
2022-06-15T03:56:08.729 controller-0 collectd[2848373]: info alarm notifier reading: 0.53 % usage - /var/lib/postgresql
2022-06-15T03:56:37.794 controller-0 collectd[2848373]: info alarm notifier reading: 0.00 % usage - /dev
2022-06-15T03:56:37.794 controller-0 collectd[2848373]: info alarm notifier reading: 48.93 % usage - /
2022-06-15T03:56:37.794 controller-0 collectd[2848373]: info alarm notifier reading: 38.29 % usage - /boot

Name changes from centos to debian
MountPoint "/scratch" --> now /var/rootdirs/scratch
MountPoint "/opt/etcd" --> now /var/rootdirs/opt/etcd
MountPoint "/opt/platform" --> now /var/rootdirs/opt/platform
MountPoint "/opt/extension" --> now /var/rootdirs/opt/extension
MountPoint "/opt/backups" --> now /var/rootdirs/opt/backups

Alarms
------
none

Test Activity
-------------
System Engineering

Workaround
----------
Periodically poll filesystem usage via df command

Changed in starlingx:
assignee: nobody → Gerry Kopec (gerry-kopec)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/monitoring/+/847108

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.debian stx.monitor
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/c/starlingx/monitoring/+/847108
Committed: https://opendev.org/starlingx/monitoring/commit/c3d70afe3f383f4b21fa7b62399043f3857fa0b7
Submitter: "Zuul (22348)"
Branch: master

commit c3d70afe3f383f4b21fa7b62399043f3857fa0b7
Author: Gerry Kopec <email address hidden>
Date: Tue Jun 21 13:30:35 2022 -0400

    Update collectd disk usage checks for debian

    Update filesystems tracked by collectd:
     - new filesystems in debian: /var, /boot/efi
     - filesystems with different paths in debian: /opt/etcd, /opt/platform,
       /opt/extension, /opt/backups, /scratch
     - add /opt/platform-backup to both centos and debian

    Test Plan:
    PASS: Centos: verify correct filesystems are monitored by collectd
    PASS: Debian: verify correct filesystems are monitored by collectd
    PASS: Add additional files to monitored filesystem until usage hit 80%
          and then 90%. Verify that major and critical alarms are triggered
    PASS: Remove additional files and verify that alarm clears when usage
          drops below 80%

    Closes-Bug: 1979367
    Signed-off-by: Gerry Kopec <email address hidden>
    Change-Id: Ifc063c29ab825a748516302b231e95ff353e94aa

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.