StarlingX

DX config controller-1 failed, INSYNC in false status

Bug #2035277 reported by Peng Peng on 2023-09-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	High	Chris Friesen

Bug Description

Brief Description
-----------------
DX system initian config controller-1 failed, INSYNC in false status

manually fixed things up and re-ran the "kubeadm join"

Severity
--------
Major

Steps to Reproduce
------------------
DX DM config controller-1

TC-name:

Expected Behavior
------------------
DX DM config controller-1 success

Actual Behavior
----------------
DX DM config controller-1 failed

Reproducibility
---------------
This is the first time saw this issue

System Configuration
--------------------
Two node system

Lab-name: SM_5-6

Branch/Pull Time/Commit
-----------------------
Job: STX_build_debian_master
Build ID: 20230910T060000Z

Last Pass
---------
20230904T060000Z

Timestamp/Logs
--------------
[2023-09-12 16:48:56,547] 349 DEBUG MainThread ssh.send :: Send 'kubectl get hosts -n=deployment -o=wide'
[2023-09-12 16:48:56,597] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*$keystone_admin$\]\$ in prompt
[2023-09-12 16:48:56,661] 471 DEBUG MainThread ssh.expect :: Output:
NAME ADMINISTRATIVE OPERATIONAL AVAILABILITY PROFILE INSYNC SCOPE RECONCILED
controller-0 unlocked enabled available controller-0-profile true bootstrap true
controller-1 unlocked disabled offline controller-0-profile false bootstrap false

[2023-09-12 16:49:25,032] 349 DEBUG MainThread ssh.send :: Send 'fm --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2023-09-12 16:49:25,082] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*$keystone_admin$\]\$ in prompt
[2023-09-12 16:49:27,542] 471 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+
| f1461281-034c-4ecc-b741-bf9f1ba38e55 | 250.001 | controller-1 Configuration is out-of-date. (applied: 7a86fdd5-fae1-460b-bb78-3e06cc890341 target: cd7d648a-f0d5-4433-a1f6-eb7fe2ff625c) | host=controller-1 | major | 2023-09-12T16:10:30.354532 |
| 92b2d036-7226-439a-9597-3cbd0641d21e | 200.004 | controller-1 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. | host=controller-1 | critical | 2023-09-12T16:06:18.767276 |
| e103f00e-e0af-472c-bf49-04de02368317 | 200.011 | controller-1 experienced a configuration failure. | host=controller-1 | critical | 2023-09-12T16:06:18.716507 |
| d847ddbf-8a7b-41e4-935f-f6b21cf4830f | 800.011 | Loss of replication in replication group group-0: no OSDs | cluster=a0539d19-9f7a-4086-b502-faf11533c777.peergroup=group-0.host=controller-1 | major | 2023-09-12T15:35:40.053351 |
| 964fd76e-815d-4b7f-a4d4-eea33e671752 | 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=web-services | major | 2023-09-12T15:35:00.457362 |
| e5dd39ba-0045-4929-9795-e3580a213bf4 | 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2023-09-12T15:34:51.687316 |
| 740ecf2b-1ba0-40ec-8c4a-ba7ca087cb8e | 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2023-09-12T15:34:51.491325 |
| 29944748-687a-401d-a67f-3b5a8c55dee0 | 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2023-09-12T15:34:39.349296 |
| 38952301-ab6a-48fd-a913-c33be5df7b42 | 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=storage-services | major | 2023-09-12T15:34:38.407315 |
| 5c4d39e1-312c-46b3-8aee-5ddb03a6f71a | 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2023-09-12T15:34:38.235327 |
| 5538f052-8c7c-496c-9d06-406ae05fb07f | 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=patching-services | major | 2023-09-12T15:34:37.567335 |
| 79b6ce34-9636-4176-aaad-f4876e11bcaf | 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=directory-services | major | 2023-09-12T15:34:37.399290 |
| 2447569a-ccb8-4830-bd5d-662db32fb6c0 | 400.002 | Service group storage-monitoring-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=storage-monitoring-services | major | 2023-09-12T15:34:36.873307 |
| 5cbe1123-106d-4da7-9bcc-582af874f629 | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=mgmt | major | 2023-09-12T15:34:36.707289 |
| 26fae809-dac5-4c6d-a659-e9295b906b1e | 400.005 | Communication failure detected with peer over port eno1 on host controller-0 | host=controller-0.network=oam | major | 2023-09-12T15:34:36.474391 |
| 06cd23bf-170b-4632-8a9c-a0167e6e219c | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=cluster-host | major | 2023-09-12T15:34:36.238757 |
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

Automation log:
http://128.224.150.21/auto_logs/sys_install/stx/sm_5_6/202309121031/TIS_AUTOMATION.log

collect log:

Test Activity
-------------
installation

See original description

Tags:

Revision history for this message

Peng Peng (ppeng) wrote on 2023-09-12:

ALL_NODES_20230912.174632.tar Edit (8.6 MiB, application/x-tar)

summary:	- DX DM config controller-1 failed, INSYNC in false status + DX config controller-1 failed, INSYNC in false status
description:	updated

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2023-09-13 (last edit on 2023-09-14):

Marking as high since this is causing a red sanity for DX deployments

Changed in starlingx:
assignee:	nobody → Chris Friesen (cbf123)
tags:	added: stx.9.0 stx.containers
Changed in starlingx:
importance:	Undecided → High