Storage node configuration failed after performing DOR

Bug #1884184 reported by Senthil Mukundakumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Medium
Daniel Safta

Bug Description

Brief Description
-----------------
After performing Power down and power up of all nodes in the large system, the storage node remains in failed state.

| | | | | |
| 200.011 | storage-0 experienced a configuration failure. | host=storage-0 | critical | 2020-06-18T |
| | | | | 23:10:42. |
| | | | | 765067 |
| | | | | |
| 200.004 | storage-0 experienced a service-affecting failure. Auto- | host=storage-0 | critical | 2020-06-18T |
| | recovery in progress. Manual Lock and Unlock may be | | | 21:38:08. |
| | required if auto-recovery is unsuccessful. | | | 843009 |
| | | | | |

[sysadmin@controller-1 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | compute-10 | worker | unlocked | enabled | available |
| 5 | compute-11 | worker | unlocked | enabled | available |
| 6 | compute-12 | worker | unlocked | enabled | available |
| 7 | compute-13 | worker | unlocked | enabled | available |
| 8 | compute-14 | worker | unlocked | enabled | available |
| 9 | compute-15 | worker | unlocked | enabled | available |
| 10 | compute-16 | worker | unlocked | enabled | available |
| 11 | compute-17 | worker | unlocked | enabled | available |
| 12 | compute-18 | worker | unlocked | enabled | available |
| 13 | compute-2 | worker | unlocked | enabled | available |
| 14 | compute-3 | worker | unlocked | enabled | available |
| 15 | compute-4 | worker | unlocked | enabled | available |
| 16 | compute-5 | worker | unlocked | enabled | available |
| 17 | compute-6 | worker | unlocked | enabled | available |
| 18 | compute-7 | worker | unlocked | enabled | available |
| 19 | compute-8 | worker | unlocked | enabled | available |
| 20 | compute-9 | worker | unlocked | enabled | available |
| 21 | controller-1 | controller | unlocked | enabled | available |
| 22 | storage-0 | storage | unlocked | disabled | failed |
| 23 | storage-1 | storage | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

Severity
--------
Medium

Steps to Reproduce
------------------
Powerdown all the nodes and power up again.

Expected Behavior
------------------
All nodes have to be in available after power on

Actual Behavior
----------------
Storage node is in fail state

Reproducibility
---------------
Reproduced in DOR testing

System Configuration
--------------------
2+2+20 (WCP_35_60)

Branch/Pull Time/Commit
-----------------------
2020-06-12_20-00-00

Last Pass
---------
https://files.starlingx.kube.cengn.ca/launchpad/1884184

Timestamp/Logs
--------------

Test Activity
-------------
System test

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marked as stx.5.0 / medium priority - issue w/ a full dead office recovery on a large system

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.5.0 stx.storage
Changed in starlingx:
assignee: nobody → Stefan Dinescu (stefandinescu)
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Frank Miller (sensfan22)
Changed in starlingx:
assignee: Stefan Dinescu (stefandinescu) → Daniel Safta (dsafta)
Revision history for this message
Frank Miller (sensfan22) wrote :

This issue is not seen recently and was only reported the one time. If the frequency of this issue increases then please open a new LP with a recent load.

Changed in starlingx:
status: Triaged → Won't Fix
Ghada Khalil (gkhalil)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.