100.104 alarm "File System threshold exceeded" raised and not cleared after stx-openstack app reapplied

Bug #1878673 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Yuxing

Bug Description

Brief Description
-----------------
After stx-openstack reapplied, 100.104 alarm "File System threshold exceeded ; threshold 80.00%, actual 80.38%" appeared and not cleared.

Severity
--------
Major

Steps to Reproduce
------------------
reapply stx-openstack
check alarm-list

TC-name: z_containers/test_openstack_services.py::test_stx_openstack_helm_override_update_and_reset

Expected Behavior
------------------
no 100.104 alarm

Actual Behavior
----------------
100.104 alarm raised and not cleared

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
One node system

Lab-name: wcp_122

Branch/Pull Time/Commit
-----------------------
2020-05-13_20-00-00

Last Pass
---------
Load: 2020-04-22_00-10-00

Timestamp/Logs
--------------
[2020-05-14 14:26:10,465] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-apply stx-openstack'

[2020-05-14 14:26:12,466] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-05-14 14:26:14,053] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| cert-manager | 1.0-0 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-1-centos-stable- | armada-manifest | stx-openstack.yaml | applying | None |
| | versioned | | | | |

[2020-05-14 14:32:22,366] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-05-14 14:32:23,847] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| cert-manager | 1.0-0 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-1-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |
| | versioned | | | | |
| | | | | | |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+

[2020-05-14 14:32:23,953] 314 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-05-14 14:32:25,053] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+-------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+-------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
| b211106c-9ac3-46e7-8265-89b2d3118d9c | 100.101 | Platform CPU threshold exceeded ; threshold 95.00%, actual 95.19% | host=controller-0 | critical | 2020-05-14T14:31:05.221033 |
| 6f6aabbb-436b-4ddc-b823-8f7955ce6a91 | 100.104 | File System threshold exceeded ; threshold 80.00%, actual 80.38% | host=controller-0.filesystem=/var/lib/docker | major | 2020-05-14T14:27:05.585897 |
+--------------------------------------+----------+-------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
tags: added: stx.retestneeded
Revision history for this message
Brent Rowsell (brent-rowsell) wrote :

Which filesystem is reporting the alarm ?

Revision history for this message
Ghada Khalil (gkhalil) wrote :

From the alarm text, it appears that the impacted filesystem is controller-0.filesystem=/var/lib/docker

Revision history for this message
Ghada Khalil (gkhalil) wrote :

May need to increase the default filesystem size for docker

tags: added: stx.4.0 stx.containers
Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Yuxing (yuxing)
Revision history for this message
Peng Peng (ppeng) wrote :

Issue was reproduced on
Lab: WCP_122
Load: 2020-06-03_20-00-00

log added at
https://files.starlingx.kube.cengn.ca/launchpad/1878673

Revision history for this message
Yuxing (yuxing) wrote :

Also observed this alarm in all in one simplex node after applying the stx_openstack. However the usage of /var/lib/docker is only 57%. Investigating this issue

Revision history for this message
Yuxing (yuxing) wrote :

Tested with lab wcp_122

The usage:

Before applying stx-openstack
/dev/mapper/cgts--vg-docker--lv 30G 16G 15G 53% /var/lib/docker

Peak:
/dev/mapper/cgts--vg-docker--lv 30G 26G 5.0G 84% /var/lib/docker

Complete:
/dev/mapper/cgts--vg-docker--lv 30G 19G 12G 62% /var/lib/docker

Revision history for this message
Yuxing (yuxing) wrote :

And the alarm is cleared after the apply process complete.

The whole process starts from : 2020-06-11T18:22:59.148804+00:00, ends at: 2020-06-11T19:01:47.838849+00:00. Looking into the test case, 6 minutes duration may not long enough to apply stx-openstack in this lab.

Revision history for this message
Tee Ngo (teewrs) wrote :

This is a documentation task. The user should size up docker fs prior to applying the stx-openstack app. The default allocation should not be changed as like any non-platform apps, stx-openstack app is optionally applied.

Revision history for this message
Frank Miller (sensfan22) wrote :

I agree with Tee's assessment. The documentation should be updated to indicate the stx-openstack application requires additional filesystem space for the docker filesystem to prevent alarms occuring during its application apply.

tags: added: stx.5.0
removed: stx.4.0
Revision history for this message
Frank Miller (sensfan22) wrote :

Removed the stx.4.0 tag as this is a minor issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to docs (master)

Fix proposed to branch: master
Review: https://review.opendev.org/742462

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to docs (master)

Reviewed: https://review.opendev.org/742462
Committed: https://git.openstack.org/cgit/starlingx/docs/commit/?id=513e25e884ca81a9e24c874909c7a8f1b4662715
Submitter: Zuul
Branch: master

commit 513e25e884ca81a9e24c874909c7a8f1b4662715
Author: Yuxing Jiang <email address hidden>
Date: Wed Jul 22 11:06:11 2020 -0400

    Add enlarge docker file system size instruction

    This commit adds one step to enlarge the size of dock_lv file system to
    prevent the installation of stx-openstack fills up the file system.

    Change-Id: I61c91d759c43d2b829c8db4b0061c21314b9ce43
    Closes-Bug: 1878673
    Signed-off-by: Yuxing Jiang <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

We have not seen this issue for a while.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.