Version: 4.1.0.0-37-ocata
Topology: 3node HA with multiple computes (multi-cluster Esxi) & Kvm
Shown the problem setup to Ram and he triaged and find issues related to below,
https://ask.openstack.org/en/question/111005/kolla-ansible-pike-installation-fluentd-container-in-restarting-mode/
>>> below SM debug.log of failures
"2017-11-10 17:22:46,634-INFO-sm_ansible_callback.py:53-append(): TASK [mariadb : Waiting for MariaDB service to be ready]"
"2017-11-10 17:23:12,523-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:23:12,524-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:23:12,905-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:23:12,906-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:23:20,882-INFO-server_mgr_ssh_client.py:65-connect(): CONNECT SUCCESS: Host => 10.87.36.15, option => key"
"2017-11-10 17:23:47,665-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.19, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:23:47,666-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s27 Try: 59: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:23:48,721-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.21, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:23:48,721-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s31 Try: 59: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:23:49,721-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.20, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:23:49,722-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s25 Try: 59: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:23:58,891-INFO-server_mgr_ssh_client.py:65-connect(): CONNECT SUCCESS: Host => 10.87.36.12, option => key"
"2017-11-10 17:24:39,741-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.21, option => key, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:24:53,319-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:24:53,319-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:24:53,711-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:24:53,711-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:25:14,894-INFO-server_mgr_ssh_client.py:65-connect(): CONNECT SUCCESS: Host => 10.87.36.11, option => key"
"2017-11-10 17:25:50,665-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.19, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:25:50,666-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s27 Try: 60: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:25:51,721-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.21, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:25:51,722-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s31 Try: 60: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:25:52,721-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.20, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:25:52,721-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s25 Try: 60: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:25:52,894-INFO-server_mgr_ssh_client.py:65-connect(): CONNECT SUCCESS: Host => 10.87.36.10, option => key"
"2017-11-10 17:26:33,745-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.20, option => key, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:26:34,093-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:26:34,094-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:26:34,464-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:26:34,466-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:27:11,745-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.19, option => key, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:27:46,898-INFO-server_mgr_ssh_client.py:65-connect(): CONNECT SUCCESS: Host => 10.87.36.18, option => key"
"2017-11-10 17:27:53,665-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.19, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:27:53,666-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s27 Try: 61: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.19"
"2017-11-10 17:27:54,721-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.21, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:27:54,722-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s31 Try: 61: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.21"
"2017-11-10 17:27:55,721-INFO-server_mgr_ssh_client.py:60-connect(): CONNECT FAILED: Host => 10.87.36.20, option => password, ERROR => [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:27:55,722-ERROR-server_mgr_mon_base_plugin.py:465-copy_ssh_keys_to_server(): COPY-KEYS: Host : contrailvm-5a10s25 Try: 61: ERROR Copying Keys: [Errno None] Unable to connect to port 22 on 10.87.36.20"
"2017-11-10 17:28:14,894-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:28:14,895-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:28:15,262-DEBUG-server_mgr_main.py:894-validate_smgr_get(): match key returned: {'cluster_id': ['cluster-vcenter-compute']}"
"2017-11-10 17:28:15,263-DEBUG-server_mgr_main.py:896-validate_smgr_get(): select keys: None"
"2017-11-10 17:28:24,900-INFO-server_mgr_ssh_client.py:65-connect(): CONNECT SUCCESS: Host => 10.87.36.15, option => key"
"2017-11-10 17:28:51,676-INFO-sm_ansible_callback.py:53-append(): fatal: [10.87.36.11]: FAILED! => (item - None) {"attempts": 10, "changed": false, "failed": true, "module_stderr": "Shared connection to 10.87.36.11 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_bhgfLW/ansible_module_wait_for.py\", line 585, in <module>\r\n main()\r\n File \"/tmp/ansible_bhgfLW/ansible_module_wait_for.py\", line 525, in main\r\n response = s.recv(1024)\r\nsocket.error: [Errno 104] Connection reset by peer\r\n", "msg": "MODULE FAILURE", "rc": 0}"
"2017-11-10 17:28:51,921-INFO-sm_ansible_callback.py:53-append(): fatal: [10.87.36.12]: FAILED! => (item - None) {"attempts": 10, "changed": false, "failed": true, "module_stderr": "Shared connection to 10.87.36.12 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_JRKczT/ansible_module_wait_for.py\", line 585, in <module>\r\n main()\r\n File \"/tmp/ansible_JRKczT/ansible_module_wait_for.py\", line 525, in main\r\n response = s.recv(1024)\r\nsocket.error: [Errno 104] Connection reset by peer\r\n", "msg": "MODULE FAILURE", "rc": 0}"
"2017-11-10 17:28:51,924-INFO-sm_ansible_utils.py:496-send_REST_request(): Sending post request to http://10.87.36.10:9002/ansible_status?server_id=10.87.36.10&state=provision_failed"
"2017-11-10 17:28:51,926-DEBUG-server_mgr_status.py:134-put_ansible_status(): Server status Data 5a10s31 provision_failed 2017_11_10__17_28_51"
/auto/cores/ 1731596$ ls
log.tar