socket.timeout error in dvr CI jobs cause SSH issues

Bug #1863858 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Critical
Unassigned

Bug Description

It happens mostly in neutron-tempest-dvr job that random tests are failing due to problems with SSH to the instance. Error is always like:

2020-02-18 18:24:34,987 22897 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.96:22' as 'cirros' with public key authentication
2020-02-18 18:25:35,048 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 1. Retry after 2 seconds.
2020-02-18 18:26:37,609 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 2. Retry after 3 seconds.
2020-02-18 18:27:41,173 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 3. Retry after 4 seconds.
2020-02-18 18:28:45,701 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 4. Retry after 5 seconds.
2020-02-18 18:29:51,265 22897 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 after 4 attempts
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh Traceback (most recent call last):
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh sock=proxy_chan)
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh retry_on_signal(lambda: sock.connect(addr))
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh return function()
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh retry_on_signal(lambda: sock.connect(addr))
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh socket.timeout: timed out

And then at the end of the test:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
    sock=proxy_chan)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
    retry_on_signal(lambda: sock.connect(addr))
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
    return function()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
    retry_on_signal(lambda: sock.connect(addr))
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 229, in test_create_list_show_delete_interfaces_by_network_port
    server, ifs = self._create_server_get_interfaces()
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 88, in _create_server_get_interfaces
    self._wait_for_validation(server, validation_resources)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 73, in _wait_for_validation
    linux_client.validate_authentication()
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 60, in wrapper
    six.reraise(*original_exception)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 33, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 116, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 209, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.96 via SSH timed out.
User: cirros, Password: password

From console log it seems that fixed IP was properly configured on the instance and metadata service worked fine too.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/708624

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/708624
Reason: it is not this patch for sure, so this revert is not needed anymore

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.