AIO-DX: false alarm not cleared regarding to File System threshold exceeded

Bug #1814944 reported by mhg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

Brief Description
-----------------
An alarm 'File System threshold exceeded; threshold 90 %, actual 92%' was raised during test against 'host=controller-0.filesystem=/opt/cgcs'.
But after the active controller was changed to controller-1, and no '/opt/cgcs' on controller-0 any more, the alarm still stayed.

Severity
--------
Minor

Steps to Reproduce
------------------
1 create an image with sizes (in this case a windows server 2016):
    "virtual-size": 31138512896,
    "actual-size": 8039108608,
2 create a cinder volume using the image
3 boot a VM from the volume
4 do migrations
5 do host-swact
6 delete the VM, volume

Expected Behavior
------------------
The alarm regarding filesystem usage above threshold should be cleared after space usages went download below the threshold.

Actual Behavior
----------------
The alarm regarding filesystem usage above threshold stayed uncleared. It should be cleared after the actual usage down below the line, especially after swact, the alarm reason is no longer relevant because there's no mount for '/opt/cgcs' on controller-0 anymore.

Reproducibility
---------------
[Reproducible/Intermittent]

System Configuration
--------------------
Two node system

Branch/Pull Time/Commit
-----------------------
StarlingX_Upstream as of 2019-02-05_20-18-00

Timestamp/Logs
--------------

2019-02-06 11:45:06

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; likely related to the collectd feature

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.2019.05 stx.metal
Ken Young (kenyis)
Changed in starlingx:
assignee: Eric MacDonald (rocksolidmtce) → Cindy Xie (xxie1)
Changed in starlingx:
assignee: Cindy Xie (xxie1) → chen haochuan (martin1982)
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
chen haochuan (martin1982) wrote :

issue still could reproduce on latest codebase. and these are easier reproduce step

1, deploy duplex system
2, copy files to /opt/cgcs to make file system alarm on active controller
3, make swact, and login new active controller
4, check with "fm alarm-list", there is two alarm log, one for controller-0, one for controller-1
5, delete file in /opt/cgcs, alarm for new active controller will be cleared

Changed in starlingx:
assignee: chen haochuan (martin1982) → nobody
assignee: nobody → chen haochuan (martin1982)
Changed in starlingx:
assignee: chen haochuan (martin1982) → Eric MacDonald (rocksolidmtce)
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Revision history for this message
mhg (marvinhg) wrote :

The problem was not reproduced during retest.

The alarm "File System threshold exceeded" was cleared after the disk usage of /opt/cgcs dropped down to 1% (its previous status).

Details:
 - Shortly after a file with 20G in size under /opt/cgcs, which is 20G in total, an alarm was triggered
      "File System threshold exceeded ; threshold 90.00%, actual 99.91%"

      host=controller-1.filesystem=/opt/cgcs

 - The offending controller became 'degraded'

 - After reduced the disk usage (by removing the 20G test file under /opt/cgcs), the alarm cleared. And the
     controller became 'available'.

tags: removed: stx.retestneeded
Revision history for this message
mhg (marvinhg) wrote :

The test was on a 2-node AIO-DX system, with load:
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190612T013000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="142"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-06-12 01:30:00 +0000"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.