socket.timeout error in dvr CI jobs cause SSH issues

Bug #1863858 reported by Slawek Kaplonski on 2020-02-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Critical
Unassigned

Bug Description

It happens mostly in neutron-tempest-dvr job that random tests are failing due to problems with SSH to the instance. Error is always like:

2020-02-18 18:24:34,987 22897 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.96:22' as 'cirros' with public key authentication
2020-02-18 18:25:35,048 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 1. Retry after 2 seconds.
2020-02-18 18:26:37,609 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 2. Retry after 3 seconds.
2020-02-18 18:27:41,173 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 3. Retry after 4 seconds.
2020-02-18 18:28:45,701 22897 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 (timed out). Number attempts: 4. Retry after 5 seconds.
2020-02-18 18:29:51,265 22897 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.96 after 4 attempts
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh Traceback (most recent call last):
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh sock=proxy_chan)
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh retry_on_signal(lambda: sock.connect(addr))
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh return function()
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh retry_on_signal(lambda: sock.connect(addr))
2020-02-18 18:29:51.265 22897 ERROR tempest.lib.common.ssh socket.timeout: timed out

And then at the end of the test:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 107, in _get_ssh_connection
    sock=proxy_chan)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in connect
    retry_on_signal(lambda: sock.connect(addr))
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/util.py", line 283, in retry_on_signal
    return function()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/client.py", line 349, in <lambda>
    retry_on_signal(lambda: sock.connect(addr))
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 229, in test_create_list_show_delete_interfaces_by_network_port
    server, ifs = self._create_server_get_interfaces()
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 88, in _create_server_get_interfaces
    self._wait_for_validation(server, validation_resources)
  File "/opt/stack/tempest/tempest/api/compute/servers/test_attach_interfaces.py", line 73, in _wait_for_validation
    linux_client.validate_authentication()
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 60, in wrapper
    six.reraise(*original_exception)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 33, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 116, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 209, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.96 via SSH timed out.
User: cirros, Password: password

From console log it seems that fixed IP was properly configured on the instance and metadata service worked fine too.

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/708624
Reason: it is not this patch for sure, so this revert is not needed anymore

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers