250.001 alarm raised and not clearing

Bug #1859845 reported by Wendy Mitchell on 2020-01-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
yong hu

Bug Description

Brief Description
-----------------
Alarm "Configuration is out-of-date" is not clearing even after lock/unlock operation on single host (controller/worker)

Severity
--------
Minor

Steps to Reproduce
------------------
The testcase creates flavors, launches instances on the controller/worker
Reboots the host with reboot -f command
Waits for reboot to succeed and state to reach:
['controller-0'] have reached state(s): {'availability': ['available', 'degraded']}

Note: The alarm was raised @ 2020-01-15T16:06:29.734 and did not clear

The test then deletes the instances and flavors that were created and checks the alarms.

300 seconds later @ [2020-01-15 16:18:13,883] the 250.001 alarm is still there

Expected Behavior
------------------
The alarm "Configuration is out-of-date" should be cleared but does not

Actual Behavior
----------------
The alarm "Configuration is out-of-date" (250.001) is not cleared ever (even after lock/unlock operation)

Reproducibility
---------------
yes

(failed teardown in test
nova/test_evacuate_vms.py::TestOneHostAvail::test_reboot_only_host)

System Configuration
--------------------
tested on
single node system

Branch/Pull Time/Commit
-----------------------
20200111T023000Z

Last Pass
---------

Timestamp/Logs
--------------

see Fm-manager.log when the alarm first appears

2020-01-15T16:06:29.734 fmMsgServer.cpp(398): Raising Alarm/Log, (250.001) (host=controller-0)

2020-01-15T16:06:29.735 fmMsgServer.cpp(421): Alarm created/updated: (250.001) (host=controller-0) (3) (58e6c0f9-097a-4167-b586-61466fd3c934)

2020-01-15T16:06:29.735 fmMsgServer.cpp(430): Send response for create fault, uuid:(58e6c0f9-097a-4167-b586-61466fd3c934) (0)

see Fm-manager.log after lock/unlock attempt

2020-01-15T16:36:56.327 fmMsgServer.cpp(421): Alarm created/updated: (250.001) (host=controller-0) (3) (7d0feb43-50e4-4117-a4ef-2218da2e04f0)

2020-01-15T16:36:56.327 fmMsgServer.cpp(430): Send response for create fault, uuid:(7d0feb43-50e4-4117-a4ef-2218da2e04f0) (0)

Test Activity
-------------
LP retest

Workaround
----------

Wendy Mitchell (wmitchellwr) wrote :

lab: SM-2

Wendy Mitchell (wmitchellwr) wrote :
Wendy Mitchell (wmitchellwr) wrote :
Yang Liu (yliu12) on 2020-01-16
tags: added: stx.retestneeded
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority -- stale alarm, but seems to be reproducible

tags: added: stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → yong hu (yhu6)
Ghada Khalil (gkhalil) wrote :

Yong, is there somebody from your team who can look at this? It's not clear whether this is related to the launched VMs before the reboot or not.

yong hu (yhu6) wrote :

Ghada, I will have someone to have a similar test without launching VMs and see if any alarms are there. But at the same time, please ask someone from Flock team checking this issue too.

Ghada Khalil (gkhalil) on 2020-01-17
tags: added: stx.4.0
zhipeng liu (zhipengs) wrote :

Hi Ghada,

Do we have any update from Flock team? Thanks!

Zhipeng

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers