Secondary controller is administratively locked
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Critical
|
Al Bailey |
Bug Description
Brief Description
-----------------
During the provisioning part of StarlingX using 20190609T233000Z, the secondary controller is administratively locked automatically.
Severity
--------
Critical. stx-openstack cannot be applied unless all nodes are online.
Steps to Reproduce
------------------
Following up the wiki setup. During the apply of stx-openstack, controller-1 (secondary controller) is administratively locked automatically.
Expected Behavior
------------------
controller-1 is unlocked/
Actual Behavior
----------------
controller-1 is automatically (administratively) locked, and is locked/
Reproducibility
---------------
100%
System Configuration
-------
Duplex and Controller storage
Branch/Pull Time/Commit
-------
20190609T233000Z
Last Pass
---------
This didn't happened with CENGN ISO from: 20190604T144018Z
Timestamp/Logs
--------------
A full collect log is attached from a standard configuration. I couldn't find any relevant error message stating why controller-1 was automatically put offline. Here are the messages:
System host-list:
[wrsroot@
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | locked | disabled | online |
+----+-
From fm alarm-list:
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 800.011 | Loss of replication in replication group group-0: OSDs are down | cluster=
| | | 44b6-84c3-
| | | group-0.
| | | | | |
| 800.001 | Storage Alarm Condition: HEALTH_WARN [PGs are degraded/stuck or undersized]. Please check 'ceph -s' | cluster=
| | for more details. | 44b6-84c3-
| | | | | |
| 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members | service_
| | available | service_
| | | | | |
| 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby | service_
| | members available | service_
| | | | | |
| 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members | service_
| | available | service_
| | | | | |
| 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members | service_
| | available | service_
| | | | | |
| 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members | service_
| | available | service_
| | | | | |
| 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active | service_
| | member available | service_
| | | | | |
| 200.001 | controller-1 was administratively locked to take it out-of-service. | host=controller-1 | warning | 2019-06-10T02:56: |
| | | | | 17.981421 |
| | | | | |
| 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member | service_
| | available | service_
| | | | | |
| 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member | service_
| | available | service_
| | | | | |
| 400.002 | Service group storage-
| | standby members available | service_
| | | services | | |
| | | | | |
| 250.001 | controller-1 Configuration is out-of-date. | host=controller-1 | major | 2019-06-10T02:00: |
| | | | | 37.098627 |
| | | | | |
| 250.001 | controller-0 Configuration is out-of-date. | host=controller-0 | major | 2019-06-10T02:00: |
| | | | | 37.035417 |
A system host-show is as follows:
[wrsroot@
+------
| Property | Value |
+------
| alarm_id | 200.001 |
| alarm_state | set |
| alarm_type | operational-
| degrade_affecting | False |
| entity_instance_id | host=controller-1 |
| entity_type_id | system.host |
| mgmt_affecting | True |
| probable_cause | out-of-service |
| proposed_
| reason_text | controller-1 was administratively locked to take it out-of-service. |
| service_affecting | True |
| severity | warning |
| suppression | False |
| suppression_status | unsuppressed |
| timestamp | 2019-06-
| uuid | 07fb0079-
+------
Test Activity
-------------
Sanity
According to the alarms, ceph might be related, and, another thing that I noticed on a locked controller-1, is that calico interfaces are not created, is this expected because is locked?