influxdb writing data files to rootfs

Bug #1827301 reported by Gerry Kopec
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
Observed continuous growth in rootfs usage on active controller over a week when running in steady state. Total growth was 840MB, source of growth was /var/lib/influxdb/data/collectd/collectd samples and /var/lib/influxdb/data/_internal/monitor directories. Generally, we should avoid storing sample data in roots or ensure that it is engineered to be limited to avoid eventually running out of space.

Severity
--------
Major

Steps to Reproduce
------------------
- install system
- periodically monitor rootfs disk usage via "df -m" and "du -x --max-depth=3 /"

Expected Behavior
------------------
rootfs usage should remain stable.

Actual Behavior
----------------
rootfs usage is steadily increasing over time

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Multi-node (2+10)

Branch/Pull Time/Commit
-----------------------
cengn load: 20190421T233001Z

Last Pass
---------
New test

Timestamp/Logs
--------------
Relevant files found in /var/lib/influxdb/data/
controller-1:/var/lib/docker/containers# date; ls -rltR /var/lib/influxdb/data/*
Thu May 2 00:44:00 UTC 2019
/var/lib/influxdb/data/_internal:
total 4
drwx------ 2 influxdb influxdb 4096 May 2 00:00 monitor

/var/lib/influxdb/data/_internal/monitor:
total 246492
-rw-r--r-- 1 influxdb influxdb 16777216 Apr 24 23:59 4
-rw-r--r-- 1 influxdb influxdb 33554432 Apr 26 00:00 6
-rw-r--r-- 1 influxdb influxdb 33554432 Apr 26 23:59 8
-rw-r--r-- 1 influxdb influxdb 33554432 Apr 27 23:59 10
-rw-r--r-- 1 influxdb influxdb 67108864 Apr 29 00:00 12
-rw-r--r-- 1 influxdb influxdb 67108864 Apr 29 23:59 14
-rw-r--r-- 1 influxdb influxdb 67108864 Apr 30 23:59 16
-rw-r--r-- 1 influxdb influxdb 67108864 May 2 00:00 18
-rw-r--r-- 1 influxdb influxdb 4194304 May 2 00:44 20

/var/lib/influxdb/data/collectd:
total 4
drwx------ 2 influxdb influxdb 4096 May 2 00:00 collectd samples

/var/lib/influxdb/data/collectd/collectd samples:
total 620140
-rw-r--r-- 1 influxdb influxdb 33554432 Apr 24 00:00 2
-rw-r--r-- 1 influxdb influxdb 134217728 Apr 25 00:00 3
-rw-r--r-- 1 influxdb influxdb 134217728 Apr 26 00:00 5
-rw-r--r-- 1 influxdb influxdb 134217728 Apr 27 00:00 7
-rw-r--r-- 1 influxdb influxdb 134217728 Apr 28 00:01 9
-rw-r--r-- 1 influxdb influxdb 134217728 Apr 29 00:02 11
-rw-r--r-- 1 influxdb influxdb 134217728 Apr 30 00:00 13
-rw-r--r-- 1 influxdb influxdb 134217728 May 1 00:00 15
-rw-r--r-- 1 influxdb influxdb 134217728 May 2 00:00 17
-rw-r--r-- 1 influxdb influxdb 8388608 May 2 00:43 19

actual disk usage per file:
controller-1:/var/lib/docker/containers# date; du -ma /var/lib/influxdb/data/
Thu May 2 00:44:09 UTC 2019
32 /var/lib/influxdb/data/collectd/collectd samples/2
72 /var/lib/influxdb/data/collectd/collectd samples/3
72 /var/lib/influxdb/data/collectd/collectd samples/5
71 /var/lib/influxdb/data/collectd/collectd samples/7
72 /var/lib/influxdb/data/collectd/collectd samples/9
72 /var/lib/influxdb/data/collectd/collectd samples/11
71 /var/lib/influxdb/data/collectd/collectd samples/13
69 /var/lib/influxdb/data/collectd/collectd samples/15
72 /var/lib/influxdb/data/collectd/collectd samples/17
8 /var/lib/influxdb/data/collectd/collectd samples/19
606 /var/lib/influxdb/data/collectd/collectd samples
606 /var/lib/influxdb/data/collectd
14 /var/lib/influxdb/data/_internal/monitor/4
19 /var/lib/influxdb/data/_internal/monitor/6
23 /var/lib/influxdb/data/_internal/monitor/8
28 /var/lib/influxdb/data/_internal/monitor/10
33 /var/lib/influxdb/data/_internal/monitor/12
37 /var/lib/influxdb/data/_internal/monitor/14
42 /var/lib/influxdb/data/_internal/monitor/16
47 /var/lib/influxdb/data/_internal/monitor/18
3 /var/lib/influxdb/data/_internal/monitor/20
241 /var/lib/influxdb/data/_internal/monitor
241 /var/lib/influxdb/data/_internal
847 /var/lib/influxdb/data/

controller-1 went active around 20:00 on Apr. 23 and remained active.

Logs available on request.

Test Activity
-------------
System Engineering

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating as the rootfs gets filled up over time. To mitigate this issue, the retention period for influxdb will be reduced.

tags: added: stx.2.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/658200

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/658200
Committed: https://git.openstack.org/cgit/starlingx/integ/commit/?id=904da6755d92855a34c2f675391769c52965caaf
Submitter: Zuul
Branch: master

commit 904da6755d92855a34c2f675391769c52965caaf
Author: Eric MacDonald <email address hidden>
Date: Thu May 9 15:46:58 2019 -0400

    Reduce the collectd samples retention period

    Collectd creates a samples database within the
    InfluxDB database which is stored in the rootfs.

    The current 4 week retention period is too long
    for larger systems and could lead to the rootfs
    filling up.

    This update reduces that retention perid to 1 week
    to protect the rootfs from being filled up with
    sample data until the samples database is moved
    to a more appropriate location.

    Change-Id: Ic59712849fa228f19d15919594d23edc43109a0b
    Closes-Bug: 1827301
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.