Brief Description
-----------------
5 mins after host-swact completed, an unexpected swact occurred and alarm-list showed 270.001 "Host controller-1 compute services failure"
Severity
--------
Major
Steps to Reproduce
------------------
host-swact
run some reading cmd
TC-name: mtc/test_swact.py::test_swact_controller_platform
Expected Behavior
------------------
no unexpected swact happened
Actual Behavior
----------------
unexpected swact happened
Reproducibility
---------------
Seen once
System Configuration
--------------------
Two node system
Lab-name: IP_5-6
Branch/Pull Time/Commit
-----------------------
stx master as of 2019-08-13_20-59-00
Last Pass
---------
Lab: WCP_63_66
Load: 2019-08-13_20-59-00
Timestamp/Logs
--------------
[2019-08-14 08:52:10,857] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'
[sysadmin@controller-0 ~(keystone_admin)]$
[2019-08-14 08:52:35,441] 301 DEBUG MainThread ssh.send :: Send ''
[2019-08-14 08:52:38,545] 275 INFO MainThread ssh.wait_for_disconnect:: ssh session to 128.224.151.216 disconnected
[2019-08-14 08:52:38,545] 1564 INFO MainThread host_helper.wait_for_swact_complete:: ssh to 128.224.151.216 OAM floating IP disconnected, indicating swact initiated.
[2019-08-14 08:53:08,575] 151 INFO MainThread ssh.connect :: Attempt to connect to host - 128.224.151.216
[2019-08-14 08:53:09,956] 301 DEBUG MainThread ssh.send :: Send ''
[2019-08-14 08:53:10,059] 423 DEBUG MainThread ssh.expect :: Output:
controller-1:~$
controller-1:~$
[2019-08-14 08:56:09,694] 466 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-08-14 08:56:09,695] 301 DEBUG MainThread ssh.send :: Send 'stat /etc/platform/.task_affining_incomplete'
[2019-08-14 08:57:09,812] 394 WARNING MainThread ssh.expect :: No match found for ['.*controller\\-[01][:| ].*\\$ '].
expect timeout.
[2019-08-14 09:01:50,420] 793 DEBUG MainThread ssh.close :: connection closed. host: 128.224.151.216, user: sysadmin. Object ID: 140532888100312
[2019-08-14 09:01:50,420] 248 DEBUG MainThread ssh.connect :: Retry in 10 seconds
[2019-08-14 09:02:01,700] 301 DEBUG MainThread ssh.send :: Send ''
[2019-08-14 09:02:01,803] 423 DEBUG MainThread ssh.expect :: Output:
controller-0:~$
[2019-08-14 09:02:16,375] 301 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-08-14 09:02:17,752] 423 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+----------+----------------------------+
| f900135a-72ca-42b1-a8dc-00a2c76e52e7 | 270.001 | Host controller-1 compute services failure, | host=controller-1.services=compute | critical | 2019-08-14T09:02:12.648454 |
| 752dc3ff-4e14-4acc-8482-b4de9c90cd03 | 200.004 | controller-1 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful. | host=controller-1 | critical | 2019-08-14T09:02:08.649239 |
| a8b4dc4e-429c-4650-86ca-bd542b56aed6 | 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2019-08-14T09:02:01.418293 |
| a6c0928c-b82f-4145-bab6-329dcd84a2a3 | 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=directory-services | major | 2019-08-14T09:02:01.337270 |
| c06d1156-8220-4b84-82a5-5e23798d3154 | 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=web-services | major | 2019-08-14T09:02:01.256242 |
| 6edce745-1622-4afc-b64a-26ecbbf08faf | 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=storage-services | major | 2019-08-14T09:02:01.175300 |
| d88e327a-861a-425a-81cc-b78cc2a2b994 | 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2019-08-14T09:02:00.501254 |
| 700abbc8-c44f-495a-b08a-ad655d23982a | 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=patching-services | major | 2019-08-14T09:02:00.420246 |
| fe6b6b82-eb2d-4919-9673-86bcd39e91cc | 400.002 | Service group storage-monitoring-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=storage-monitoring-services | major | 2019-08-14T09:02:00.096293 |
| 0bac73c7-0207-4140-a382-6011c8c68dc7 | 400.005 | Communication failure detected with peer over port enp11s0f1 on host controller-0 | host=controller-0.network=mgmt | major | 2019-08-14T09:02:00.015249 |
| 2b2b2bf2-47ae-4b75-8bbc-81815347613e | 400.005 | Communication failure detected with peer over port enp11s0f0 on host controller-0 | host=controller-0.network=oam | major | 2019-08-14T09:01:59.934269 |
| c570309c-2e7e-48ef-aac6-69ea24d7980f | 400.005 | Communication failure detected with peer over port ens3f0 on host controller-0 | host=controller-0.network=cluster-host | major | 2019-08-14T09:01:59.853257 |
| 001ca8f1-fee3-4f3d-ba78-7b935518a8ea | 400.002 | Service group controller-services has no active members available; expected 1 active member | service_domain=controller.service_group=controller-services | critical | 2019-08-14T09:01:59.611279 |
| 74ec6293-a70a-4c24-97e5-1fb5bde6549a | 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2019-08-14T09:01:59.449250 |
| 113a5ed5-3971-43c1-b5c4-7f084e17c50d | 100.114 | NTP address 2607:5300:60:97 is not a valid or a reachable NTP server. | host=controller-1.ntp=2607:5300:60:97 | minor | 2019-08-14T08:39:57.707422 |
+--------------------------------------+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+----------+----------------------------+
controller-0:~$
Test Activity
-------------
Sanity
Assigning to Bin to triage before deciding on priority / release gate.