Activity log for bug #2035277

Date Who What changed Old value New value Message
2023-09-12 18:22:56 Peng Peng bug added bug
2023-09-12 18:22:56 Peng Peng attachment added ALL_NODES_20230912.174632.tar https://bugs.launchpad.net/bugs/2035277/+attachment/5700408/+files/ALL_NODES_20230912.174632.tar
2023-09-12 18:25:59 Peng Peng summary DX DM config controller-1 failed, INSYNC in false status DX config controller-1 failed, INSYNC in false status
2023-09-12 18:26:15 Peng Peng description Brief Description ----------------- DX DM config controller-1 failed, INSYNC in false status manually fixed things up and re-ran the "kubeadm join" Severity -------- Major Steps to Reproduce ------------------ DX DM config controller-1 TC-name: Expected Behavior ------------------ DX DM config controller-1 success Actual Behavior ---------------- DX DM config controller-1 failed Reproducibility --------------- This is the first time saw this issue System Configuration -------------------- Two node system Lab-name: SM_5-6 Branch/Pull Time/Commit ----------------------- Job: STX_build_debian_master Build ID: 20230910T060000Z Last Pass --------- 20230904T060000Z Timestamp/Logs -------------- [2023-09-12 16:48:56,547] 349 DEBUG MainThread ssh.send :: Send 'kubectl get hosts -n=deployment -o=wide' [2023-09-12 16:48:56,597] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt [2023-09-12 16:48:56,661] 471 DEBUG MainThread ssh.expect :: Output: NAME ADMINISTRATIVE OPERATIONAL AVAILABILITY PROFILE INSYNC SCOPE RECONCILED controller-0 unlocked enabled available controller-0-profile true bootstrap true controller-1 unlocked disabled offline controller-0-profile false bootstrap false [2023-09-12 16:49:25,032] 349 DEBUG MainThread ssh.send :: Send 'fm --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid' [2023-09-12 16:49:25,082] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt [2023-09-12 16:49:27,542] 471 DEBUG MainThread ssh.expect :: Output: +--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+ | UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+ | f1461281-034c-4ecc-b741-bf9f1ba38e55 | 250.001 | controller-1 Configuration is out-of-date. (applied: 7a86fdd5-fae1-460b-bb78-3e06cc890341 target: cd7d648a-f0d5-4433-a1f6-eb7fe2ff625c) | host=controller-1 | major | 2023-09-12T16:10:30.354532 | | 92b2d036-7226-439a-9597-3cbd0641d21e | 200.004 | controller-1 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. | host=controller-1 | critical | 2023-09-12T16:06:18.767276 | | e103f00e-e0af-472c-bf49-04de02368317 | 200.011 | controller-1 experienced a configuration failure. | host=controller-1 | critical | 2023-09-12T16:06:18.716507 | | d847ddbf-8a7b-41e4-935f-f6b21cf4830f | 800.011 | Loss of replication in replication group group-0: no OSDs | cluster=a0539d19-9f7a-4086-b502-faf11533c777.peergroup=group-0.host=controller-1 | major | 2023-09-12T15:35:40.053351 | | 964fd76e-815d-4b7f-a4d4-eea33e671752 | 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=web-services | major | 2023-09-12T15:35:00.457362 | | e5dd39ba-0045-4929-9795-e3580a213bf4 | 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2023-09-12T15:34:51.687316 | | 740ecf2b-1ba0-40ec-8c4a-ba7ca087cb8e | 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2023-09-12T15:34:51.491325 | | 29944748-687a-401d-a67f-3b5a8c55dee0 | 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2023-09-12T15:34:39.349296 | | 38952301-ab6a-48fd-a913-c33be5df7b42 | 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=storage-services | major | 2023-09-12T15:34:38.407315 | | 5c4d39e1-312c-46b3-8aee-5ddb03a6f71a | 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2023-09-12T15:34:38.235327 | | 5538f052-8c7c-496c-9d06-406ae05fb07f | 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=patching-services | major | 2023-09-12T15:34:37.567335 | | 79b6ce34-9636-4176-aaad-f4876e11bcaf | 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=directory-services | major | 2023-09-12T15:34:37.399290 | | 2447569a-ccb8-4830-bd5d-662db32fb6c0 | 400.002 | Service group storage-monitoring-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=storage-monitoring-services | major | 2023-09-12T15:34:36.873307 | | 5cbe1123-106d-4da7-9bcc-582af874f629 | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=mgmt | major | 2023-09-12T15:34:36.707289 | | 26fae809-dac5-4c6d-a659-e9295b906b1e | 400.005 | Communication failure detected with peer over port eno1 on host controller-0 | host=controller-0.network=oam | major | 2023-09-12T15:34:36.474391 | | 06cd23bf-170b-4632-8a9c-a0167e6e219c | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=cluster-host | major | 2023-09-12T15:34:36.238757 | +--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+ [sysadmin@controller-0 ~(keystone_admin)]$ Automation log: http://128.224.150.21/auto_logs/sys_install/stx/sm_5_6/202309121031/TIS_AUTOMATION.log collect log: Test Activity ------------- installation Brief Description ----------------- DX system initian config controller-1 failed, INSYNC in false status manually fixed things up and re-ran the "kubeadm join" Severity -------- Major Steps to Reproduce ------------------ DX DM config controller-1 TC-name: Expected Behavior ------------------ DX DM config controller-1 success Actual Behavior ---------------- DX DM config controller-1 failed Reproducibility --------------- This is the first time saw this issue System Configuration -------------------- Two node system Lab-name: SM_5-6 Branch/Pull Time/Commit ----------------------- Job: STX_build_debian_master Build ID: 20230910T060000Z Last Pass --------- 20230904T060000Z Timestamp/Logs -------------- [2023-09-12 16:48:56,547] 349 DEBUG MainThread ssh.send :: Send 'kubectl get hosts -n=deployment -o=wide' [2023-09-12 16:48:56,597] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt [2023-09-12 16:48:56,661] 471 DEBUG MainThread ssh.expect :: Output: NAME ADMINISTRATIVE OPERATIONAL AVAILABILITY PROFILE INSYNC SCOPE RECONCILED controller-0 unlocked enabled available controller-0-profile true bootstrap true controller-1 unlocked disabled offline controller-0-profile false bootstrap false [2023-09-12 16:49:25,032] 349 DEBUG MainThread ssh.send :: Send 'fm --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid' [2023-09-12 16:49:25,082] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt [2023-09-12 16:49:27,542] 471 DEBUG MainThread ssh.expect :: Output: +--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+ | UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp | +--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+ | f1461281-034c-4ecc-b741-bf9f1ba38e55 | 250.001 | controller-1 Configuration is out-of-date. (applied: 7a86fdd5-fae1-460b-bb78-3e06cc890341 target: cd7d648a-f0d5-4433-a1f6-eb7fe2ff625c) | host=controller-1 | major | 2023-09-12T16:10:30.354532 | | 92b2d036-7226-439a-9597-3cbd0641d21e | 200.004 | controller-1 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. | host=controller-1 | critical | 2023-09-12T16:06:18.767276 | | e103f00e-e0af-472c-bf49-04de02368317 | 200.011 | controller-1 experienced a configuration failure. | host=controller-1 | critical | 2023-09-12T16:06:18.716507 | | d847ddbf-8a7b-41e4-935f-f6b21cf4830f | 800.011 | Loss of replication in replication group group-0: no OSDs | cluster=a0539d19-9f7a-4086-b502-faf11533c777.peergroup=group-0.host=controller-1 | major | 2023-09-12T15:35:40.053351 | | 964fd76e-815d-4b7f-a4d4-eea33e671752 | 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=web-services | major | 2023-09-12T15:35:00.457362 | | e5dd39ba-0045-4929-9795-e3580a213bf4 | 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2023-09-12T15:34:51.687316 | | 740ecf2b-1ba0-40ec-8c4a-ba7ca087cb8e | 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2023-09-12T15:34:51.491325 | | 29944748-687a-401d-a67f-3b5a8c55dee0 | 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2023-09-12T15:34:39.349296 | | 38952301-ab6a-48fd-a913-c33be5df7b42 | 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=storage-services | major | 2023-09-12T15:34:38.407315 | | 5c4d39e1-312c-46b3-8aee-5ddb03a6f71a | 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2023-09-12T15:34:38.235327 | | 5538f052-8c7c-496c-9d06-406ae05fb07f | 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=patching-services | major | 2023-09-12T15:34:37.567335 | | 79b6ce34-9636-4176-aaad-f4876e11bcaf | 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=directory-services | major | 2023-09-12T15:34:37.399290 | | 2447569a-ccb8-4830-bd5d-662db32fb6c0 | 400.002 | Service group storage-monitoring-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=storage-monitoring-services | major | 2023-09-12T15:34:36.873307 | | 5cbe1123-106d-4da7-9bcc-582af874f629 | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=mgmt | major | 2023-09-12T15:34:36.707289 | | 26fae809-dac5-4c6d-a659-e9295b906b1e | 400.005 | Communication failure detected with peer over port eno1 on host controller-0 | host=controller-0.network=oam | major | 2023-09-12T15:34:36.474391 | | 06cd23bf-170b-4632-8a9c-a0167e6e219c | 400.005 | Communication failure detected with peer over port vlan160 on host controller-0 | host=controller-0.network=cluster-host | major | 2023-09-12T15:34:36.238757 | +--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+----------------------------+ [sysadmin@controller-0 ~(keystone_admin)]$ Automation log: http://128.224.150.21/auto_logs/sys_install/stx/sm_5_6/202309121031/TIS_AUTOMATION.log collect log: Test Activity ------------- installation
2023-09-13 14:07:12 Ghada Khalil starlingx: assignee Chris Friesen (cbf123)
2023-09-13 14:08:15 Ghada Khalil tags stx.9.0 stx.containers
2023-09-13 14:08:22 Ghada Khalil starlingx: importance Undecided High
2023-09-15 21:11:47 Ghada Khalil starlingx: status New Fix Released