So I'm not sure this is the same bug or not, but we just started running the 'check-tempest-dsvm-ironic-parallel-nv' in the Ironic pipeline yesterday and we are seeing a percentage of failures in test_network_advanced_server_ops @ test_server_connectivity_reboot. In our case:
* Original server boots
* First ssh connectivity check passes (via floating ip)
* Server reboots, become active
* Second ssh connectivity check fails:
- Connection is made to the server, but _get_ssh_connection() times out ~6 min. (!) later
The symptoms in our case look consistent across all failures and it certainly smells like something wrong in the guest. Unfortunately, Ironic does not support getting console log so I'm in the blind there.
2014-09-23 19:30:55,769 29820 INFO [paramiko.transport] Authentication (publickey) successful!
2014-09-23 19:30:55,771 29820 INFO [tempest.common.ssh] ssh connection to cirros@172.24.4.39 successfuly created
2014-09-23 19:30:56,069 29820 INFO [tempest.common.rest_client] Request (TestNetworkAdvancedServerOps:test_server_connectivity_reboot): 202 POST http://127.0.0.1:8774/v2/403a6c74810c47ea9b109f7f07e14c90/servers/788eaed6-284c-4b95-bef0-b08251eef043/action 0.117s
2014-09-23 19:30:56,168 29820 INFO [tempest.common.rest_client] Request (TestNetworkAdvancedServerOps:test_server_connectivity_reboot): 200 GET http://127.0.0.1:8774/v2/403a6c74810c47ea9b109f7f07e14c90/servers/788eaed6-284c-4b95-bef0-b08251eef043 0.099s
2014-09-23 19:30:57,283 29820 INFO [tempest.common.rest_client] Request (TestNetworkAdvancedServerOps:test_server_connectivity_reboot): 200 GET http://127.0.0.1:8774/v2/403a6c74810c47ea9b109f7f07e14c90/servers/788eaed6-284c-4b95-bef0-b08251eef043 0.098s
2014-09-23 19:30:57,297 29820 INFO [tempest.common.waiters] State transition "REBOOT/rebooting" ==> "REBOOT/reboot_started" after 1 second wait
2014-09-23 19:30:58,402 29820 INFO [tempest.common.rest_client] Request (TestNetworkAdvancedServerOps:test_server_connectivity_reboot): 200 GET http://127.0.0.1:8774/v2/403a6c74810c47ea9b109f7f07e14c90/servers/788eaed6-284c-4b95-bef0-b08251eef043 0.104s
2014-09-23 19:30:59,516 29820 INFO [tempest.common.rest_client] Request (TestNetworkAdvancedServerOps:test_server_connectivity_reboot): 200 GET http://127.0.0.1:8774/v2/403a6c74810c47ea9b109f7f07e14c90/servers/788eaed6-284c-4b95-bef0-b08251eef043 0.096s
2014-09-23 19:30:59,531 29820 INFO [tempest.common.waiters] State transition "REBOOT/reboot_started" ==> "ACTIVE/None" after 3 second wait
2014-09-23 19:30:59,531 29820 INFO [tempest.scenario.manager] Tenant networks not configured to be reachable.
2014-09-23 19:30:59,531 29820 DEBUG [tempest.scenario.manager] checking network connections to IP 172.24.4.39 with user: cirros
2014-09-23 19:30:59,541 29820 INFO [tempest.common.ssh] Creating ssh connection to '172.24.4.39' as 'cirros' with public key authentication
2014-09-23 19:30:59,548 29820 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2012.55)
2014-09-23 19:36:39,266 29820 ERROR [tempest.scenario.manager] Public network connectivity check failed
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager Traceback (most recent call last):
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "tempest/scenario/manager.py", line 670, in _check_public_network_connectivity
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager should_connect=should_connect)
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "tempest/scenario/manager.py", line 657, in _check_vm_connectivity
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager self.get_remote_client(ip_address, username, private_key)
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "tempest/scenario/manager.py", line 333, in get_remote_client
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager linux_client.validate_authentication()
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "tempest/common/utils/linux/remote_client.py", line 54, in validate_authentication
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager self.ssh_client.test_connection_auth()
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "tempest/common/ssh.py", line 151, in test_connection_auth
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager connection = self._get_ssh_connection()
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "tempest/common/ssh.py", line 76, in _get_ssh_connection
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager timeout=self.channel_timeout, pkey=self.pkey)
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 265, in connect
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager t.start_client()
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 402, in start_client
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager event.wait(0.1)
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "/usr/lib/python2.7/threading.py", line 620, in wait
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager self.__cond.wait(timeout)
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "/usr/lib/python2.7/threading.py", line 358, in wait
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager _sleep(delay)
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager File "/usr/local/lib/python2.7/dist-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager raise TimeoutException()
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager TimeoutException
2014-09-23 19:36:39.266 29820 TRACE tempest.scenario.manager
So I'm not sure this is the same bug or not, but we just started running the 'check- tempest- dsvm-ironic- parallel- nv' in the Ironic pipeline yesterday and we are seeing a percentage of failures in test_network_ advanced_ server_ ops @ test_server_ connectivity_ reboot. In our case:
* Original server boots connection( ) times out ~6 min. (!) later
* First ssh connectivity check passes (via floating ip)
* Server reboots, become active
* Second ssh connectivity check fails:
- Connection is made to the server, but _get_ssh_
The symptoms in our case look consistent across all failures and it certainly smells like something wrong in the guest. Unfortunately, Ironic does not support getting console log so I'm in the blind there.
2014-09-23 19:30:55,769 29820 INFO [paramiko. transport] Authentication (publickey) successful! common. ssh] ssh connection to cirros@172.24.4.39 successfuly created common. rest_client] Request (TestNetworkAdv ancedServerOps: test_server_ connectivity_ reboot) : 202 POST http:// 127.0.0. 1:8774/ v2/403a6c74810c 47ea9b109f7f07e 14c90/servers/ 788eaed6- 284c-4b95- bef0-b08251eef0 43/action 0.117s common. rest_client] Request (TestNetworkAdv ancedServerOps: test_server_ connectivity_ reboot) : 200 GET http:// 127.0.0. 1:8774/ v2/403a6c74810c 47ea9b109f7f07e 14c90/servers/ 788eaed6- 284c-4b95- bef0-b08251eef0 43 0.099s common. rest_client] Request (TestNetworkAdv ancedServerOps: test_server_ connectivity_ reboot) : 200 GET http:// 127.0.0. 1:8774/ v2/403a6c74810c 47ea9b109f7f07e 14c90/servers/ 788eaed6- 284c-4b95- bef0-b08251eef0 43 0.098s common. waiters] State transition "REBOOT/rebooting" ==> "REBOOT/ reboot_ started" after 1 second wait common. rest_client] Request (TestNetworkAdv ancedServerOps: test_server_ connectivity_ reboot) : 200 GET http:// 127.0.0. 1:8774/ v2/403a6c74810c 47ea9b109f7f07e 14c90/servers/ 788eaed6- 284c-4b95- bef0-b08251eef0 43 0.104s common. rest_client] Request (TestNetworkAdv ancedServerOps: test_server_ connectivity_ reboot) : 200 GET http:// 127.0.0. 1:8774/ v2/403a6c74810c 47ea9b109f7f07e 14c90/servers/ 788eaed6- 284c-4b95- bef0-b08251eef0 43 0.096s common. waiters] State transition "REBOOT/ reboot_ started" ==> "ACTIVE/None" after 3 second wait scenario. manager] Tenant networks not configured to be reachable. scenario. manager] checking network connections to IP 172.24.4.39 with user: cirros common. ssh] Creating ssh connection to '172.24.4.39' as 'cirros' with public key authentication transport] Connected (version 2.0, client dropbear_2012.55) scenario. manager] Public network connectivity check failed scenario. manager Traceback (most recent call last): scenario. manager File "tempest/ scenario/ manager. py", line 670, in _check_ public_ network_ connectivity scenario. manager should_ connect= should_ connect) scenario. manager File "tempest/ scenario/ manager. py", line 657, in _check_ vm_connectivity scenario. manager self.get_ remote_ client( ip_address, username, private_key) scenario. manager File "tempest/ scenario/ manager. py", line 333, in get_remote_client scenario. manager linux_client. validate_ authentication( ) scenario. manager File "tempest/ common/ utils/linux/ remote_ client. py", line 54, in validate_ authentication scenario. manager self.ssh_ client. test_connection _auth() scenario. manager File "tempest/ common/ ssh.py" , line 151, in test_connection _auth scenario. manager connection = self._get_ ssh_connection( ) scenario. manager File "tempest/ common/ ssh.py" , line 76, in _get_ssh_connection scenario. manager timeout= self.channel_ timeout, pkey=self.pkey) scenario. manager File "/usr/local/ lib/python2. 7/dist- packages/ paramiko/ client. py", line 265, in connect scenario. manager t.start_client() scenario. manager File "/usr/local/ lib/python2. 7/dist- packages/ paramiko/ transport. py", line 402, in start_client scenario. manager event.wait(0.1) scenario. manager File "/usr/lib/ python2. 7/threading. py", line 620, in wait scenario. manager self.__ cond.wait( timeout) scenario. manager File "/usr/lib/ python2. 7/threading. py", line 358, in wait scenario. manager _sleep(delay) scenario. manager File "/usr/local/ lib/python2. 7/dist- packages/ fixtures/ _fixtures/ timeout. py", line 52, in signal_handler scenario. manager raise TimeoutException() scenario. manager TimeoutException scenario. manager
2014-09-23 19:30:55,771 29820 INFO [tempest.
2014-09-23 19:30:56,069 29820 INFO [tempest.
2014-09-23 19:30:56,168 29820 INFO [tempest.
2014-09-23 19:30:57,283 29820 INFO [tempest.
2014-09-23 19:30:57,297 29820 INFO [tempest.
2014-09-23 19:30:58,402 29820 INFO [tempest.
2014-09-23 19:30:59,516 29820 INFO [tempest.
2014-09-23 19:30:59,531 29820 INFO [tempest.
2014-09-23 19:30:59,531 29820 INFO [tempest.
2014-09-23 19:30:59,531 29820 DEBUG [tempest.
2014-09-23 19:30:59,541 29820 INFO [tempest.
2014-09-23 19:30:59,548 29820 INFO [paramiko.
2014-09-23 19:36:39,266 29820 ERROR [tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.
2014-09-23 19:36:39.266 29820 TRACE tempest.