StarlingX

After BnR, 800.001 ceph alarm appears after host lock/unlock

Bug #1867628 reported by Peng Peng on 2020-03-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Ovidiu Poncea

Bug Description

Brief Description
-----------------
After backup and restore success on SX system, there is no 800.001 alarm on system. After lock/unlock host is done, alarm 800.001 "Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details." show up and never cleared.

Severity
--------
Major

Steps to Reproduce
------------------
backup and restore on SX system
Do lock/unlock host.
After lock/unlock, check alarm-list

TC-name: mtc/test_lock_unlock_host.py::test_lock_unlock_host[controller] (After BnR)

Expected Behavior
------------------
no 800.001 alarm

Actual Behavior
----------------
800.001 alarm appear

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
One node system

Lab-name: wcp_112

Branch/Pull Time/Commit
-----------------------
2020-03-15_04-10-00

Last Pass
---------
unknown

Timestamp/Logs
--------------

[2020-03-16 09:35:33,092] 314 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-03-16 09:35:34,057] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+---------------------------+-------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+---------------------------+-------------------------------------+----------+----------------------------+
| a6fe60bb-f141-41ae-8a13-fa2d014c06c4 | 750.002 | Application Apply Failure | k8s_application=platform-integ-apps | major | 2020-03-16T09:29:34.467950 |
+--------------------------------------+----------+---------------------------+-------------------------------------+----------+----------------------------+

[2020-03-16 09:35:36,872] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock controller-0'

[2020-03-16 09:36:16,109] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'

[2020-03-16 09:45:30,818] 314 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-03-16 09:45:32,403] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
| b90c4655-aab6-441b-a82c-64ff6f2892f6 | 800.001 | Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. | cluster=38d7d5fb-48bd-4d0d-bfab-0813c303a055 | warning | 2020-03-16T09:44:37.714525 |
| 1c7cee9b-907e-4f42-931c-1a8e62c8590e | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2020-03-16T09:43:57.745494 |
| 230ef514-9342-4b30-a489-53cfeaf76bcb | 100.114 | NTP address 64:ff9b::d806:246 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::d806:246 | minor | 2020-03-16T09:43:57.704690 |
| 0a8a04d0-8f7d-444a-b45a-732535a20741 | 100.114 | NTP address 64:ff9b::c7b6:dd6e is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::c7b6:dd6e | minor | 2020-03-16T09:43:57.663727 |
| e66f1978-5198-41ee-aa23-94365a50c833 | 100.114 | NTP address 64:ff9b::a29f:c87b is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::a29f:c87b | minor | 2020-03-16T09:43:57.622337 |
| a6fe60bb-f141-41ae-8a13-fa2d014c06c4 | 750.002 | Application Apply Failure | k8s_application=platform-integ-apps | major | 2020-03-16T09:29:34.467950 |
+--------------------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------+----------+----------------------------+
controller-0:~$

Test Activity
-------------
Sanity

Tags:

Revision history for this message

Peng Peng (ppeng) wrote on 2020-03-16:

log @
https://files.starlingx.kube.cengn.ca/launchpad/1867628

Ghada Khalil (gkhalil) on 2020-03-16

tags:	added: stx.4.0 stx.update
tags:	added: stx.storage

Revision history for this message

Yang Liu (yliu12) wrote on 2020-03-16:

controller-0:~$ ceph -s
  cluster:
    id: 38d7d5fb-48bd-4d0d-bfab-0813c303a055
    health: HEALTH_WARN
            1 chassis (1 osds) down
            1 osds down
            1 host (1 osds) down
            1 root (1 osds) down
            64 slow ops, oldest one blocked for 26482 sec, mon.controller-0 has slow ops

  services:
    mon: 1 daemons, quorum controller-0
    mgr: controller-0(active)
    osd: 1 osds: 0 up, 1 in

  data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage: 124 MiB used, 930 GiB / 930 GiB avail
    pgs:

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-03-16:

stx.4.0 / medium priority - issue related to a specific area: Backup & Restore functionality

Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
assignee:	nobody → Ovidiu Poncea (ovidiu.poncea)

Yang Liu (yliu12) on 2020-04-26

tags:

added: stx.retestneeded

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2020-06-25:

Fix released based on the B&R work done in the past few months.

Changed in starlingx:
status:	Triaged → Fix Released

Revision history for this message

Peng Peng (ppeng) wrote on 2020-07-24:

This issue is not reproducible or seen in recent BR execution.

tags:

removed: stx.retestneeded

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.