A large number of files generated by filebeat pod are not removed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Kevin Smith |
Bug Description
Brief Description
-----------------
The following issue was observed in a distributed cloud configuration. The /var/log partition was filled up due to space taken by a large number of filebeat deleted files.
Severity
--------
Critical
Steps to Reproduce
------------------
Set up a large distributed cloud with stx-monitor applied and soak for a few days with some test activities such as deploying, managing/unamaging and removing subclouds.
Expected Behavior
------------------
Service logs are saved to disks and rotated accordingly
Actual Behavior
----------------
logmgmt process was hogging cpu, no logs were flushed to disk. Log files were rotated rapidly with almost no content and filesystem critical alarm was generated.
The problem documented here (courtesy of Al Bailey)
https:/
might be the cause of this issue
Reproducibility
---------------
Seen once
System Configuration
-------
IPv6 Distributed Cloud
Branch/Pull Time/Commit
-------
Feb 22 master code
Last Pass
---------
N/A
Timestamp/Logs
--------------
As logs were not flushed to disk, there are
See list of deleted files as a result of running the command "sudo lsof|grep deleted" attached
Test Activity
-------------
Evaluation
Workaround
----------
Kill logmgmt process and delete filebeat pods.
stx.4.0 / high priority - stx-monitor resulting in running out of log space on distributed cloud