Activity log for bug #1865924

Date Who What changed Old value New value Message
2020-03-03 20:24:29 Tee Ngo bug added bug
2020-03-03 20:24:29 Tee Ngo attachment added deleted_files.txt https://bugs.launchpad.net/bugs/1865924/+attachment/5333163/+files/deleted_files.txt
2020-03-03 20:25:35 Tee Ngo description Brief Description ----------------- The following issue was observed in a distributed cloud configuration. The /var/log partition was filled up due to space taken by a large number of filebeat deleted files. Severity -------- Critical Steps to Reproduce ------------------ Set up a large distributed cloud with stx-monitor applied and soak for a few days with some test activities such as deploying, managing/unamaging and removing subclouds. Expected Behavior ------------------ Service logs are saved to disks and rotated accordingly Actual Behavior ---------------- logmgmt process was hogging cpu, no logs were flushed to disk. Log files were rotated rapidly with almost no content and critical alarms were generated. The problem documented here (courtesy of Al Bailey) https://www.elastic.co/guide/en/beats/filebeat/master/faq-deleted-files-are-not-freed.html might be the cause of this issue Reproducibility --------------- Seen once System Configuration -------------------- IPv6 Distributed Cloud Branch/Pull Time/Commit ----------------------- Feb 22 master code Last Pass --------- N/A Timestamp/Logs -------------- As logs were not flushed to disk, there are See list of deleted files as a result of running the command "sudo lsof|grep deleted" attached Test Activity ------------- Evaluation Workaround ---------- Kill logmgmt process and delete filebeat pods. Brief Description ----------------- The following issue was observed in a distributed cloud configuration. The /var/log partition was filled up due to space taken by a large number of filebeat deleted files. Severity -------- Critical Steps to Reproduce ------------------ Set up a large distributed cloud with stx-monitor applied and soak for a few days with some test activities such as deploying, managing/unamaging and removing subclouds. Expected Behavior ------------------ Service logs are saved to disks and rotated accordingly Actual Behavior ---------------- logmgmt process was hogging cpu, no logs were flushed to disk. Log files were rotated rapidly with almost no content and filesystem critical alarm was generated. The problem documented here (courtesy of Al Bailey) https://www.elastic.co/guide/en/beats/filebeat/master/faq-deleted-files-are-not-freed.html might be the cause of this issue Reproducibility --------------- Seen once System Configuration -------------------- IPv6 Distributed Cloud Branch/Pull Time/Commit ----------------------- Feb 22 master code Last Pass --------- N/A Timestamp/Logs -------------- As logs were not flushed to disk, there are See list of deleted files as a result of running the command "sudo lsof|grep deleted" attached Test Activity ------------- Evaluation  Workaround  ----------  Kill logmgmt process and delete filebeat pods.
2020-03-04 18:16:17 Ghada Khalil tags stx.4.0 stx.distcloud stx.monitor
2020-03-04 18:16:45 Ghada Khalil bug added subscriber Daniel Badea
2020-03-04 18:17:28 Frank Miller bug added subscriber Matt Peters
2020-03-04 18:17:41 Ghada Khalil starlingx: importance Undecided High
2020-03-04 18:17:42 Ghada Khalil starlingx: status New Triaged
2020-03-04 18:17:51 Ghada Khalil starlingx: assignee Kevin Smith (kevin.smith.wrs)
2020-03-19 23:03:33 OpenStack Infra starlingx: status In Progress Fix Released
2020-03-31 14:35:40 OpenStack Infra tags stx.4.0 stx.distcloud stx.monitor in-f-centos8 stx.4.0 stx.distcloud stx.monitor
2020-03-31 14:35:41 OpenStack Infra bug watch added https://github.com/kubernetes/kubernetes/issues/80745
2020-03-31 14:35:41 OpenStack Infra bug watch added https://github.com/kubernetes/kubernetes/issues/85334