SSH fails in neutron-ovn-tripleo-ci-centos-8-containers-multinode job

Bug #1891307 reported by Slawek Kaplonski
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Jakub Libosvar

Bug Description

Most of the tests in the neutron-ovn-tripleo-ci-centos-8-containers-multinode job are failing due to ssh issues.
Errors example:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 490, in test_connectivity_between_vms_on_different_networks
    self._check_public_network_connectivity(should_connect=True)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_basic_ops.py", line 213, in _check_public_network_connectivity
    message, server, mtu=mtu)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 728, in check_vm_connectivity
    server=server)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 502, in get_remote_client
    linux_client.validate_authentication()
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 60, in wrapper
    six.reraise(*original_exception)
  File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 33, in wrapper
    return function(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 116, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 216, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 128, in _get_ssh_connection
    password=self.password)
tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.129 via SSH timed out.
User: cirros, Password: None

Link to the failed job: https://zuul.opendev.org/t/openstack/build/caba73ac413f43079f49737430191613

Revision history for this message
Jakub Libosvar (libosvar) wrote :

The reason is that VM can't talk to the metadata because the agent can't provision the datapath because of privsep:

2020-08-12 09:44:27.493 136893 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[140702767493648]: __init__() got an unexpected keyword argument 'libc' _process_cmd /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:490
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 249, in _wrap
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 377, in interface_exists
    idx = get_link_id(ifname, namespace, raise_exception=False)
  File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 243, in get_link_id
    with get_iproute(namespace) as ip:
  File "/usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py", line 193, in get_iproute
    return pyroute2.NetNS(namespace, flags=0, libc=priv_linux.get_cdll())
TypeError: __init__() got an unexpected keyword argument 'libc'

Maybe a pyroute version mismatch? I will investigate further.

Changed in neutron:
assignee: nobody → Jakub Libosvar (libosvar)
Revision history for this message
Jakub Libosvar (libosvar) wrote :

It seems the pyroute2 version is way too low, with this patch https://opendev.org/openstack/neutron/commit/68e5e1b8fe87b0b4938236f8f8570d92ae044e20 we require pyroute2 at at least 0.5.13 version but the used version is 0.5.6-2 that doesn't implement the interface yet.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Seems like it's much better now: http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?viewPanel=28&orgId=1
Also patch https://review.rdoproject.org/r/#/c/29002/ is merged now so pyroute2 version should be bumped already in this job.
So I think that we can close this bug for now. Thx Jakub for taking care of it.

Changed in neutron:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.