alarm 800.001 raised on lock storage-0 and not cleared when storage-0 unlocks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Won't Fix
|
Low
|
chen haochuan |
Bug Description
Brief Description
-----------------
Lock, unlock operation on storage-0 unlocked storage-0 as expected but 800.001 alarm did not clear
| 800.001 | Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. | cluster=
Severity
--------
standard
Steps to Reproduce
------------------
step 1.
confirm with ceph -s that all osd's are up and in
@ [2019-09-14 23:01:19,546] all osd's are up and in according to ceph -s ouput
osd: 6 osds: 6 up, 6 in
step 2.
lock storage-0
[2019-09-14 23:01:22,925] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
Verify CEPH cluster health reflects the OSD being down
ceph -s @ [2019-09-14 23:01:43,527]
cluster:
id: 6231df84-
health: HEALTH_WARN
3 osds down
1 host (3 osds) down
Reduced data availability: 3 pgs inactive, 9 pgs peering
too few PGs per OSD (19 < min 30)
1/3 mons down, quorum controller-
| 2deecf77-
Test Step 3: Check that alarms are raised when storage-0 is locked
Test Step 4: Check that OSDs are down
Test Step 5: Check that loss of replication alarm is raise
Test Step 6: Check that the ceph health warning alarm is raised
Test Step 7: Attempt to lock the controller-0
Test Step 8: Attempt to force lock controller-0
Test Step 9: Attempt to lock the controller-1
Only 2 storage monitor available. At least 2 unlocked and enabled hosts with monitors are required. Please ensure hosts with monitors are unlocked and enabled.
Test Step 10:
Attempt to force lock controller-1
Only 2 storage monitor available. At least 2 unlocked and enabled hosts with monitors are required. Please ensure hosts with monitors are unlocked and enabled.
Test Step 11:
Unlock storage host storage-0 at this time:
[2019-09-14 23:02:10,244] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://
see corresponding sysinv.log
2019-09-14 23:02:14.024 106806 WARNING sysinv.
Step 10 Confirm services enabled, host unlocked
system host-show storage-0 indicates the host is now unlocked, available
| action | none |
| administrative | unlocked |
| availability | available
| config_applied | e60c1692-
| boot_device | /dev/disk/
peers | {u'hosts': [u'storage-0', u'storage-1'], u'name': u'group-0'}
| operational | enabled
| vim_progress_status | services-enabled
Expected Behavior
------------------
The 800.001 Alarm that was raised when storage-0 was locked hould have cleared after storage-0 was unlocked
Actual Behavior
----------------
800.001 Alarm is raised and did not clear - see @ 2019-09-14T23:06:47
see fm-event.log
2019-09-
2019-09-
The alarm was still there @ [2019-09-14 23:09:39,109]
[2019-09-14 23:09:39,109] fm alarm-list
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| f5496afd-
Reproducibility
---------------
weekly regression
System Configuration
-------
storage system WCP_113_121
Lab-name:
Branch/Pull Time/Commit
-------
2019-09-13_20-09-52
Last Pass
---------
Timestamp/Logs
--------------
see inline
Test Activity
-------------
storage regression
tags: | added: stx.retestneeded |
tags: | added: stx.3.0 |
Changed in starlingx: | |
assignee: | chen haochuan (martin1982) → nobody |
Changed in starlingx: | |
assignee: | nobody → Linjia Chang (linjiachang) |
Changed in starlingx: | |
assignee: | Linjia Chang (linjiachang) → chen haochuan (martin1982) |
This caused the following testcase to fail: test_ceph. py::test_ lock_cont_ check_mon_ down (and also the next testcase see below)
started at [2019-09-14 22:51:52,679]
FAIL ceph/test_ ceph.py: :test_lock_ cont_check_ mon_down ceph.py: :test_storgroup _semantic_ checks
E Details: Timed out waiting for alarm 800.001 to disappear [2019-09-14 23:00:55,360]
FAIL ceph/test_
E Details: Timed out waiting for alarm 800.001 to disappear [2019-09-14 23:09:17,136] 338