standalone tempest fails "Exhausted all hosts available for retrying build failures for instance"

Bug #1810325 reported by wes hayutin on 2019-01-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Unassigned

Bug Description

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/api/compute/servers/test_attach_interfaces.py", line 335, in test_add_remove_fixed_ip
    server, ifs = self._create_server_get_interfaces()
  File "/usr/lib/python2.7/site-packages/tempest/api/compute/servers/test_attach_interfaces.py", line 77, in _create_server_get_interfaces
    wait_until='ACTIVE')
  File "/usr/lib/python2.7/site-packages/tempest/api/compute/base.py", line 246, in create_test_server
    **kwargs)
  File "/usr/lib/python2.7/site-packages/tempest/common/compute.py", line 256, in create_test_server
    server['id'])
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/site-packages/tempest/common/compute.py", line 227, in create_test_server
    clients.servers_client, server['id'], wait_until)
  File "/usr/lib/python2.7/site-packages/tempest/common/waiters.py", line 76, in wait_for_server_status
    server_id=server_id)
tempest.exceptions.BuildErrorException: Server d5c35314-1054-4010-8c05-b0a8f99d67a5 failed to build and is in ERROR status
Details: {u'message': u'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance d5c35314-1054-4010-8c05-b0a8f99d67a5.', u'code': 500, u'created': u'2019-01-02T17:19:34Z'}

http://logs.openstack.org/47/627047/1/gate/tripleo-ci-centos-7-standalone/8d55ccc/logs/stackviz/#/testrepository.subunit/test-details/tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesUnderV243Test.test_add_remove_fixed_ip

Logstash
http://logstash.openstack.org/#dashboard/file/logstash.json?query=(message%3A%5C%22Exhausted%20all%20hosts%20available%20for%20retrying%20build%20failures%20for%20instance%5C%22)%20AND%20tags%3Aconsole%20AND%20voting%3A1%20AND%20project%3A%5C%22openstack%2Ftripleo-heat-templates%5C%22

wes hayutin (weshayutin) wrote :

I think reducing the workers will fix this, patching in a few

Alex Schultz (alex-schultz) wrote :

http://logs.openstack.org/47/627047/1/gate/tripleo-ci-centos-7-standalone/8d55ccc/logs/undercloud/var/log/containers/nova/nova-conductor.log.txt.gz#_2019-01-02_17_19_34_368

2019-01-02 17:19:34.368 8 ERROR nova.scheduler.utils [req-460efc8b-dbec-4528-a55e-bff21dcb8ecc 3444e9da1e8545669f85599e24b7dca5 33cdfefea3834fe8bb73959b013bf10f - default default] [instance: d5c35314-1054-4010-8c05-b0a8f99d67a5] Error from last host: centos-7-rax-ord-0001457292.localdomain (node centos-7-rax-ord-0001457292.localdomain): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1937, in _do_build_and_run_instance\n filter_properties, request_spec)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2217, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance d5c35314-1054-4010-8c05-b0a8f99d67a5 was re-scheduled: Failure running os_vif plugin plug method: Failed to plug VIF VIFBridge(active=False,address=fa:16:3e:46:d5:6e,bridge_name=\'qbr18ca8b75-f0\',has_traffic_filtering=True,id=18ca8b75-f0b5-4808-997c-4de0c0b5f388,network=Network(09ec2145-d219-41ab-a93a-b274db970173),plugin=\'ovs\',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name=\'tap18ca8b75-f0\'). Got error: Class PyRoute2 cannot be found ([\'Traceback (most recent call last):\\n\', \' File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 32, in import_class\\n return getattr(sys.modules[mod_str], class_str)\\n\', "AttributeError: \'module\' object has no attribute \'PyRoute2\'\\n"])\n']

wes hayutin (weshayutin) wrote :

Alex pointed this out..

ERROR nova.scheduler.utils [req-460efc8b-dbec-4528-a55e-bff21dcb8ecc 3444e9da1e8545669f85599e24b7dca5 33cdfefea3834fe8bb73959b013bf10f - default default] [instance: d5c35314-1054-4010-8c05-b0a8f99d67a5] Error from last host: centos-7-rax-ord-0001457292.localdomain (node centos-7-rax-ord-0001457292.localdomain): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1937, in _do_build_and_run_instance\n filter_properties, request_spec)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2217, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance d5c35314-1054-4010-8c05-b0a8f99d67a5 was re-scheduled: Failure running os_vif plugin plug method: Failed to plug VIF VIFBridge(active=False,address=fa:16:3e:46:d5:6e,bridge_name=\'qbr18ca8b75-f0\',has_traffic_filtering=True,id=18ca8b75-f0b5-4808-997c-4de0c0b5f388,network=Network(09ec2145-d219-41ab-a93a-b274db970173),plugin=\'ovs\',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name=\'tap18ca8b75-f0\'). Got error: Class PyRoute2 cannot be found ([\'Traceback (most recent call last):\\n\', \' File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 32, in import_class\\n return getattr(sys.modules[mod_str], class_str)\\n\', "AttributeError: \'module\' object has no attribute \'PyRoute2\'\\n"])\n']

http://logs.openstack.org/47/627047/1/gate/tripleo-ci-centos-7-standalone/8d55ccc/logs/undercloud/var/log/containers/nova/nova-conductor.log.txt.gz#_2019-01-02_17_19_34_368

Got error: Class PyRoute2 cannot be found

wes hayutin (weshayutin) on 2019-01-02
Changed in tripleo:
milestone: none → stein-2
wes hayutin (weshayutin) wrote :

Failed to plug VIF VIFBridge(active=False,address=fa:16:3e:30:52:12,bridge_name='qbra7c92389-24',has_traffic_filtering=True,id=a7c92389-24b4-4c72-bd18-b216848572b9,network=Network(166f2921-a3a8-4060-ad55-5ca6faf3e2a4),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapa7c92389-24'). Got error: Class PyRoute2 cannot be found (['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 32, in import_class\n return getattr(sys.modules[mod_str], class_str)\n', "AttributeError: 'module' object has no attribute 'PyRoute2'\n"])

Reviewed: https://review.openstack.org/628047
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=69fb81b2d997a35fa6d84160baf529cc32297b6c
Submitter: Zuul
Branch: master

commit 69fb81b2d997a35fa6d84160baf529cc32297b6c
Author: Wes Hayutin <email address hidden>
Date: Wed Jan 2 14:08:25 2019 -0700

    skip delete intf by fixed ip

    We have a random failure in standalone jobs
    Got error: Class PyRoute2 cannot be found

    Related-Bug: #1810325
    Change-Id: Ia0bb30ae8242324491e80ad0e4b7a6f09ec372e1

chandan kumar (chkumar246) wrote :

I tried the reproducer but failed to reproduce it by removing the test from skip list:
+ export TEMPESTDATA=/home/zuul
+ TEMPESTDATA=/home/zuul
+ /usr/bin/tempest list-plugins
+----------------+------------------------------------------------------+
| Name | EntryPoint |
+----------------+------------------------------------------------------+
| neutron_tests | neutron_tempest_plugin.plugin:NeutronTempestPlugin |
| keystone_tests | keystone_tempest_plugin.plugin:KeystoneTempestPlugin |
+----------------+------------------------------------------------------+
+ /usr/bin/tempest run --regex tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesUnderV243Test.test_add_remove_fixed_ip --concurrency 2
/usr/lib/python2.7/site-packages/paramiko/rsakey.py:119: DeprecationWarning: signer and verifier have been deprecated. Please use sign and verify instead.
  algorithm=hashes.SHA1(),
{0} tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesUnderV243Test.test_add_remove_fixed_ip [51.885894s] ... ok

======
Totals
======
Ran: 1 tests in 73.0000 sec.
 - Passed: 1
 - Skipped: 0
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 51.8859 sec.

==============
Worker Balance
==============
 - Worker 0 (1 tests) => 0:00:51.885894
[zuul@reprosubnode-0 ~]$

wes hayutin (weshayutin) wrote :

removing alert as the test is now skipped, sending to neutron team via cix.

tags: added: promotion-blocker
removed: alert

There is a problem in os-vif, in the way the libraries (for Linux and Windows) are loaded. Those libraries are loaded inside a privsep context instead of being loaded outside the context and then called.

Instead of this, what we need in os-vif is to make a conditional load, something like:
if os.name == 'nt':
    from os_vif.internal.command.ip.windows.impl_netifaces import Netifaces as ip_lib_class
else:
    from os_vif.internal.command.ip.linux.impl_pyroute2 import PyRoute2 as ip_lib_class

and then use this class (both Linux and Windows have the same interface, os_vif.internal.command.ip.ip_command.IpCommand)

Changed in tripleo:
milestone: stein-2 → stein-3
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers