After BnR, 800.001 ceph alarm appears after host lock/unlock

Bug #1867628 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Ovidiu Poncea

Bug Description

Brief Description
-----------------
After backup and restore success on SX system, there is no 800.001 alarm on system. After lock/unlock host is done, alarm 800.001 "Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details." show up and never cleared.

Severity
--------
Major

Steps to Reproduce
------------------
backup and restore on SX system
Do lock/unlock host.
After lock/unlock, check alarm-list

TC-name: mtc/test_lock_unlock_host.py::test_lock_unlock_host[controller] (After BnR)

Expected Behavior
------------------
no 800.001 alarm

Actual Behavior
----------------
800.001 alarm appear

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
One node system

Lab-name: wcp_112

Branch/Pull Time/Commit
-----------------------
2020-03-15_04-10-00

Last Pass
---------
unknown

Timestamp/Logs
--------------

[2020-03-16 09:35:33,092] 314 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-03-16 09:35:34,057] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+---------------------------+-------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+---------------------------+-------------------------------------+----------+----------------------------+
| a6fe60bb-f141-41ae-8a13-fa2d014c06c4 | 750.002 | Application Apply Failure | k8s_application=platform-integ-apps | major | 2020-03-16T09:29:34.467950 |
+--------------------------------------+----------+---------------------------+-------------------------------------+----------+----------------------------+

[2020-03-16 09:35:36,872] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0'

[2020-03-16 09:36:16,109] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'

[2020-03-16 09:45:30,818] 314 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-03-16 09:45:32,403] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
| b90c4655-aab6-441b-a82c-64ff6f2892f6 | 800.001 | Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. | cluster=38d7d5fb-48bd-4d0d-bfab-0813c303a055 | warning | 2020-03-16T09:44:37.714525 |
| 1c7cee9b-907e-4f42-931c-1a8e62c8590e | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2020-03-16T09:43:57.745494 |
| 230ef514-9342-4b30-a489-53cfeaf76bcb | 100.114 | NTP address 64:ff9b::d806:246 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::d806:246 | minor | 2020-03-16T09:43:57.704690 |
| 0a8a04d0-8f7d-444a-b45a-732535a20741 | 100.114 | NTP address 64:ff9b::c7b6:dd6e is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::c7b6:dd6e | minor | 2020-03-16T09:43:57.663727 |
| e66f1978-5198-41ee-aa23-94365a50c833 | 100.114 | NTP address 64:ff9b::a29f:c87b is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::a29f:c87b | minor | 2020-03-16T09:43:57.622337 |
| a6fe60bb-f141-41ae-8a13-fa2d014c06c4 | 750.002 | Application Apply Failure | k8s_application=platform-integ-apps | major | 2020-03-16T09:29:34.467950 |
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
controller-0:~$

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Ghada Khalil (gkhalil)
tags: added: stx.4.0 stx.update
tags: added: stx.storage
Revision history for this message
Yang Liu (yliu12) wrote :

controller-0:~$ ceph -s
  cluster:
    id: 38d7d5fb-48bd-4d0d-bfab-0813c303a055
    health: HEALTH_WARN
            1 chassis (1 osds) down
            1 osds down
            1 host (1 osds) down
            1 root (1 osds) down
            64 slow ops, oldest one blocked for 26482 sec, mon.controller-0 has slow ops

  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 1 osds: 0 up, 1 in

  data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage: 124 MiB used, 930 GiB / 930 GiB avail
    pgs:

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - issue related to a specific area: Backup & Restore functionality

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Ovidiu Poncea (ovidiu.poncea)
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

Fix released based on the B&R work done in the past few months.

Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

This issue is not reproducible or seen in recent BR execution.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.