SSH timeout (wait timeout) due to potential paramiko issue

Bug #1926109 reported by LIU Yulong
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Unassigned
tempest
Fix Released
Undecided
Unassigned

Bug Description

Recently, nothing changed on the test case, but we got this failure:
https://04a9f9fdd9afdf12de4e-f889a65b4dfb1f628c8309e9eb44b225.ssl.cf2.rackcdn.com/787304/5/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/bb9e9a9/testr_results.html

LOG:

Traceback (most recent call last):
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_port_forwardings.py", line 160, in test_port_forwarding_editing_and_deleting_tcp_rule
    utils.wait_until_true(
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py", line 79, in wait_until_true
    while not predicate():
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_port_forwardings.py", line 153, in no_fip_pf_connectivity
    return not fip_pf_connectivity(6)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_port_forwardings.py", line 146, in fip_pf_connectivity
    self.check_servers_hostnames(
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/base.py", line 537, in check_servers_hostnames
    ssh_client.get_hostname())
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/ssh.py", line 292, in get_hostname
    return self.exec_command('hostname')
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 333, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 423, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 360, in iter
    return fut.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/tenacity/__init__.py", line 426, in __call__
    result = fn(*args, **kwargs)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/ssh.py", line 171, in exec_command
    return super(Client, self).exec_command(cmd=cmd, encoding=encoding)
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 162, in exec_command
    channel.exec_command(cmd)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/paramiko/channel.py", line 72, in _check
    return func(self, *args, **kwds)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/paramiko/channel.py", line 257, in exec_command
    self._wait_for_event()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/paramiko/channel.py", line 1219, in _wait_for_event
    self.event.wait()
  File "/usr/lib/python3.8/threading.py", line 558, in wait
    signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.8/threading.py", line 302, in wait
    waiter.acquire()
  File "/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/fixtures/_fixtures/timeout.py", line 52, in signal_handler
    raise TimeoutException()
fixtures._fixtures.timeout.TimeoutException

Some search in the neutron LP bug list, there are some old bugs related to this project "paramiko":
https://bugs.launchpad.net/neutron/+bug/1892861
https://bugs.launchpad.net/neutron/+bug/1571486
https://bugs.launchpad.net/neutron/+bug/1253896

The issue seems to be _wait_for_event is waiting for something, and finally get timeout.

After some searching in the repo of paramiko in github, I saw this issue:
https://github.com/paramiko/paramiko/pull/1248
With a commit for this:
https://github.com/paramiko/paramiko/pull/1248/commits/b059139ea712a46f632fe20e3c278414f311c21d
But it is not included in any release version.

Some thoughts:
1. if it is possible to not use paramiko?
2. run (ssh) commands directly by some oslo libs?
3. ping paramiko maintainers to release once to see if the test success rate can get better

Tags: gate-failure
tags: added: gate-failure
Revision history for this message
LIU Yulong (dragon889) wrote :
Download full text (5.3 KiB)

Similar failures:
(1) https://04f7ac194e853c778b5c-637f8ccb892d615b1b6b9e58fd553696.ssl.cf2.rackcdn.com/787474/1/check/neutron-tempest-plugin-scenario-linuxbridge-train/9358492/testr_results.html

Traceback (most recent call last):
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_multicast.py", line 275, in test_multicast_between_vms_on_same_network
    unregistered=unregistered)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_multicast.py", line 306, in _check_multicast_conectivity
    self._prepare_unregistered(unregistered, mcast_address)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_multicast.py", line 251, in _prepare_unregistered
    addresses = ip_command.list_addresses(port=server['port'])
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/common/ip.py", line 120, in list_addresses
    output = self.execute('address', *command)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/common/ip.py", line 58, in execute
    timeout=self.timeout).stdout
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/common/shell.py", line 69, in execute
    ssh_client=ssh_client)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/common/shell.py", line 103, in execute_remote_command
    stdout = ssh_client.exec_command(command, timeout=timeout)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in wrapped_f
    return self(f, *args, **kw)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/tenacity/__init__.py", line 423, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/tenacity/__init__.py", line 360, in iter
    return fut.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/tenacity/__init__.py", line 426, in __call__
    result = fn(*args, **kwargs)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/neutron_tempest_plugin/common/ssh.py", line 171, in exec_command
    return super(Client, self).exec_command(cmd=cmd, encoding=encoding)
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 160, in exec_command
    with transport.open_session() as channel:
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/transport.py", line 879, in open_session
    timeout=timeout,
  File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/paramiko/transport.py", line 1001, in open_channel
    event.wait(0.1)
  File "/usr/lib/python3.6/threading.py", line 551, in wait
    signaled = self._cond.wait(timeout)
  File "/usr/lib/python3.6/threading.py", line 299, in wait
    gotit = waiter.acquire(True, timeout)
  File...

Read more...

Changed in neutron:
importance: Undecided → Critical
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I wonder if it's not the same problem as in https://bugs.launchpad.net/neutron/+bug/1926838 - can You check in such failing test how long it took router to be switched to "primary"?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I believe this isn't an issue in the gate anymore. Maybe it was fixed by https://review.opendev.org/c/openstack/tempest/+/761964

Changed in tempest:
status: New → Fix Released
Changed in neutron:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.