Brief Description
-----------------
After host-swact cmd execution completed, alarm 400.002 | Service group controller-services has no active members available; expected 1 active member raied, and un unexpected swact occured.
Severity
--------
Critical
Steps to Reproduce
------------------
host-swact
check fm alarm-list
TC-name: z_containers/test_custom_containers.py::test_upload_charts_via_helm_upload
Expected Behavior
------------------
no unexpected swact
Actual Behavior
----------------
unexpected swact occured
Reproducibility
---------------
Seen once
System Configuration
--------------------
Two node system
Lab-name: WCP_76-77
Branch/Pull Time/Commit
-----------------------
stx master as of 2019-09-05_00-10-00
Last Pass
---------
2019-09-03_00-10-00
Timestamp/Logs
--------------
[2019-09-05 17:54:19,018] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-1'
[2019-09-05 17:58:02,286] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-09-05 17:58:03,661] 423 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+---------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+---------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+----------------------------+
| e1cdca69-48f8-47ab-9b19-075980764dc1 | 400.002 | Service group controller-services has no active members available; expected 1 active member | service_domain=controller.service_group=controller-services | critical | 2019-09-05T17:55:09.620802 |
| 3bd10bb5-d210-446e-9429-2d1be70efa58 | 100.114 | NTP address 2607:5300:201:3 is not a valid or a reachable NTP server. | host=controller-1.ntp=2607:5300:201:3 | minor | 2019-09-05T17:41:01.187069 |
| 2c68c9ae-cf55-494a-a449-76b80445a288 | 100.114 | NTP address 199.19.167.36 is not a valid or a reachable NTP server. | host=controller-1.ntp=199.19.167.36 | minor | 2019-09-05T17:41:01.172656 |
+--------------------------------------+----------+---------------------------------------------------------------------------------------------+-------------------------------------------------------------+----------+----------------------------+
controller-0:~$
[2019-09-05 17:58:03,661] 301 DEBUG MainThread ssh.send :: Send 'echo $?'
[2019-09-05 17:58:03,764] 423 DEBUG MainThread ssh.expect :: Output:
0
controller-0:~$
[2019-09-05 17:58:08,769] 1534 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_76_77
[2019-09-05 17:58:08,770] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-09-05 17:58:08,770] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-09-05 17:59:26,256] 387 WARNING MainThread ssh.expect :: EOF caught.
[2019-09-05 17:59:26,785] 60 DEBUG MainThread conftest.update_results:: ***Failure at test teardown: /usr/local/lib/python3.4/site-packages/pexpect/expect.py:49: pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
***Details: self = <pexpect.pxssh.pxssh object at 0x7f0e2669d278>, size = 100000
[2019-09-05 17:59:30,537] 151 INFO MainThread ssh.connect :: Attempt to connect to host - 128.224.150.5
[2019-09-05 17:59:32,033] 301 DEBUG MainThread ssh.send :: Send ''
[2019-09-05 17:59:32,141] 423 DEBUG MainThread ssh.expect :: Output:
controller-1:~$
[2019-09-05 17:59:32,141] 183 INFO MainThread ssh.connect :: Login successful!
[2019-09-05 17:59:38,877] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-09-05 17:59:41,688] 423 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
| 741c611c-52d9-41e3-ba41-e24e3aba7f58 | 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2019-09-05T17:59:34.555618 |
| 2ef8b4d6-721c-48d7-a7b4-8d564f8f6462 | 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2019-09-05T17:59:31.553536 |
| 9c08c0ca-4a96-4be1-8bac-25e6d2d358ba | 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2019-09-05T17:59:29.727512 |
| 4d5a0a49-e80b-4fc1-bb1d-51ee1ab7aba1 | 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2019-09-05T17:59:29.710517 |
| fe1ef010-18c7-4d24-adb4-40b17bf7e98b | 400.001 | Service group controller-services failure; ceph-mon(disabled, ) | service_domain=controller.service_group=controller-services.host=controller-0 | critical | 2019-09-05T17:59:29.680300 |
| 3bd10bb5-d210-446e-9429-2d1be70efa58 | 100.114 | NTP address 2607:5300:201:3 is not a valid or a reachable NTP server. | host=controller-1.ntp=2607:5300:201:3 | minor | 2019-09-05T17:41:01.187069 |
| 2c68c9ae-cf55-494a-a449-76b80445a288 | 100.114 | NTP address 199.19.167.36 is not a valid or a reachable NTP server. | host=controller-1.ntp=199.19.167.36 | minor | 2019-09-05T17:41:01.172656 |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
controller-1:~$
Test Activity
-------------
Sanity
Ceph-mon had failed the active controller, causing swact rollback. 05T17:55: 12.677 | 565 | service-scn | hw-mon | enabling | enabled-active | enable success 05T17:56: 02.854 | 566 | service-scn | ceph-mon | enabling | disabling | enable failed 05T17:58: 03.580 | 567 | service-scn | ceph-mon | disabling | disabling-failed | disable failed 05T17:58: 03.581 | 568 | service-group-scn | controller-services | go-active | go-active-failed | ceph-mon(disabling, failed) 05T17:58: 04.477 | 569 | service-scn | ceph-mon | disabling-failed | enabling-failed | enabled-active state requested 05T17:58: 07.793 | 570 | service-group-scn | vim-services | active | disabling | 05T17:58: 07.794 | 571 | service-group-scn | cloud-services | active | disabling | 05T17:58: 07.795 | 572 | service-group-scn | controller-services | go-active-failed | disabling-failed | ceph-mon(enabling, failed) 05T17:58: 07.796 | 573 | service-group-scn | oam-services | active | disabling |
| 2019-09-
| 2019-09-
| 2019-09-
| 2019-09-
| 2019-09-
| 2019-09-
| 2019-09-
| 2019-09-
| 2019-09-
2019-09- 05T17:55: 02.000 controller-0 sm: debug time[7529.198] log<1426> INFO: sm[107278]: sm_service_ enable. c(461): Started enable action (1008359) for service (ceph-mon). 05T17:55: 02.000 controller-0 sm: debug time[7529.198] log<1427> INFO: sm[107278]: sm_service_ fsm.c(1032) : Service (ceph-mon) received event (enable-throttle) was in the enabling-throttle state and is now in the enabling state. 05T17:55: 32.000 controller-0 sm: debug time[7559.216] log<1572> INFO: sm[107278]: sm_service_ enable. c(245): Action (enable) timeout with result (failed), state (unknown), status (unknown), and condition (unknown) for service (ceph-mon), reason_text=, exit_code=-65534. 05T17:55: 32.000 controller-0 sm: debug time[7559.216] log<1573> INFO: sm[107278]: sm_service_ action. c(345): Aborting service (ceph-mon) with kill signal, pid=1008359. 05T17:55: 32.000 controller-0 sm: debug time[7559.216] log<1574> INFO: sm[107278]: sm_service_ enable. c(154): Max retires not met for action (enable) of service (ceph-mon), attempts=1. 05T17:55: 32.000 controller-0 sm: debug time[7559.216] log<1575> INFO: sm[107278]: sm_service_ enable. c(461): Started enable action (1013901) for service (ceph-mon). 05T17:56: 02.000 controller-0 sm: debug time[7589.223] log<1577> INFO: sm[107278]: sm_service_ enable. c(245): Action (enable) timeout with result (failed), state (unknown), status (unknown), and condition (unknown) for service (ceph-mon), reason_text=, ...
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-