DX config controller-1 failed, INSYNC in false status

Bug #2035277 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Chris Friesen

Bug Description

Brief Description
-----------------
DX system initian config controller-1 failed, INSYNC in false status

manually fixed things up and re-ran the "kubeadm join"

Severity
--------
Major

Steps to Reproduce
------------------
DX DM config controller-1

TC-name:

Expected Behavior
------------------
DX DM config controller-1 success

Actual Behavior
----------------
DX DM config controller-1 failed

Reproducibility
---------------
This is the first time saw this issue

System Configuration
--------------------
Two node system

Lab-name: SM_5-6

Branch/Pull Time/Commit
-----------------------
Job: STX_build_debian_master
Build ID: 20230910T060000Z

Last Pass
---------
20230904T060000Z

Timestamp/Logs
--------------
[2023-09-12 16:48:56,547] 349 DEBUG MainThread ssh.send :: Send 'kubectl get hosts -n=deployment -o=wide'
[2023-09-12 16:48:56,597] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt
[2023-09-12 16:48:56,661] 471 DEBUG MainThread ssh.expect :: Output:
NAME ADMINISTRATIVE OPERATIONAL AVAILABILITY PROFILE INSYNC SCOPE RECONCILED
controller-0 unlocked enabled available controller-0-profile true bootstrap true
controller-1 unlocked disabled offline controller-0-profile false bootstrap false

[2023-09-12 16:49:25,032] 349 DEBUG MainThread ssh.send :: Send 'fm --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2023-09-12 16:49:25,082] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt
[2023-09-12 16:49:27,542] 471 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+
| f1461281-034c-4ecc-b741-bf9f1ba38e55 | 250.001 | controller-1 Configuration is out-of-date. (applied: 7a86fdd5-fae1-460b-bb78-3e06cc890341 target: cd7d648a-f0d5-4433-a1f6-eb7fe2ff625c) | host=controller-1 | major | 2023-09-12T16:10:30.354532 |
| 92b2d036-7226-439a-9597-3cbd0641d21e | 200.004 | controller-1 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. | host=controller-1 | critical | 2023-09-12T16:06:18.767276 |
| e103f00e-e0af-472c-bf49-04de02368317 | 200.011 | controller-1 experienced a configuration failure. | host=controller-1 | critical | 2023-09-12T16:06:18.716507 |
| d847ddbf-8a7b-41e4-935f-f6b21cf4830f | 800.011 | Loss of replication in replication group group-0: no OSDs | cluster=a0539d19-9f7a-4086-b502-faf11533c777.peergroup=group-0.host=controller-1 | major | 2023-09-12T15:35:40.053351 |
| 964fd76e-815d-4b7f-a4d4-eea33e671752 | 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=web-services | major | 2023-09-12T15:35:00.457362 |
| e5dd39ba-0045-4929-9795-e3580a213bf4 | 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2023-09-12T15:34:51.687316 |
| 740ecf2b-1ba0-40ec-8c4a-ba7ca087cb8e | 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2023-09-12T15:34:51.491325 |
| 29944748-687a-401d-a67f-3b5a8c55dee0 | 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2023-09-12T15:34:39.349296 |
| 38952301-ab6a-48fd-a913-c33be5df7b42 | 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=storage-services | major | 2023-09-12T15:34:38.407315 |
| 5c4d39e1-312c-46b3-8aee-5ddb03a6f71a | 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2023-09-12T15:34:38.235327 |
| 5538f052-8c7c-496c-9d06-406ae05fb07f | 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=patching-services | major | 2023-09-12T15:34:37.567335 |
| 79b6ce34-9636-4176-aaad-f4876e11bcaf | 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=directory-services | major | 2023-09-12T15:34:37.399290 |
| 2447569a-ccb8-4830-bd5d-662db32fb6c0 | 400.002 | Service group storage-monitoring-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=storage-monitoring-services | major | 2023-09-12T15:34:36.873307 |
| 5cbe1123-106d-4da7-9bcc-582af874f629 | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=mgmt | major | 2023-09-12T15:34:36.707289 |
| 26fae809-dac5-4c6d-a659-e9295b906b1e | 400.005 | Communication failure detected with peer over port eno1 on host controller-0 | host=controller-0.network=oam | major | 2023-09-12T15:34:36.474391 |
| 06cd23bf-170b-4632-8a9c-a0167e6e219c | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=cluster-host | major | 2023-09-12T15:34:36.238757 |
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

Automation log:
http://128.224.150.21/auto_logs/sys_install/stx/sm_5_6/202309121031/TIS_AUTOMATION.log

collect log:

Test Activity
-------------
installation

Revision history for this message
Peng Peng (ppeng) wrote :
summary: - DX DM config controller-1 failed, INSYNC in false status
+ DX config controller-1 failed, INSYNC in false status
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote (last edit ):

Marking as high since this is causing a red sanity for DX deployments

Changed in starlingx:
assignee: nobody → Chris Friesen (cbf123)
tags: added: stx.9.0 stx.containers
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Ghada Khalil (gkhalil) wrote (last edit ):

Issue introduced by: https://review.opendev.org/c/starlingx/stx-puppet/+/893091
Revert has been merged on Sept 15: https://review.opendev.org/c/starlingx/stx-puppet/+/895364

Setting this LP to Fix Released based on the revert

Changed in starlingx:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.