MessagingTimeout errors in unit tests

Bug #1371587 reported by Hans Lindgren
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Unassigned

Bug Description

These can be seen all over the unit test logs. At least some of them are caused by tests failing to mock calls to conductor api method build_instances(), which is spawning new threads to handle such builds. The timeouts happen when calls to scheduler gets no reply within the configured rpc timeout of 60 secs.

This is not actually causing any test failures but makes debugging harder since errors show up randomly in logs.

A typical error looks like this:

Traceback (most recent call last):
  File "nova/conductor/manager.py", line 614, in build_instances
    request_spec, filter_properties)
  File "nova/scheduler/client/__init__.py", line 49, in select_destinations
    context, request_spec, filter_properties)
  File "nova/scheduler/client/__init__.py", line 35, in __run_method
    return getattr(self.instance, __name)(*args, **kwargs)
  File "nova/scheduler/client/query.py", line 34, in select_destinations
    context, request_spec, filter_properties)
  File "nova/scheduler/rpcapi.py", line 107, in select_destinations
    request_spec=request_spec, filter_properties=filter_properties)
  File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 152, in call
    retry=self.retry)
  File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
    timeout=timeout, retry=retry)
  File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_fake.py", line 194, in send
    return self._send(target, ctxt, message, wait_for_reply, timeout)
  File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_fake.py", line 186, in _send
    'No reply on topic %s' % target.topic)
MessagingTimeout: No reply on topic scheduler
WARNING [nova.scheduler.driver] Setting instance to ERROR state.

Then followed by an attempt to set the instance to ERROR state, which fails since the instance does not exist in the database.

Traceback (most recent call last):
  File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 455, in fire_timers
    timer()
  File "/home/jenkins/workspace/gate-nova-python27/.tox/py27/local/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "nova/utils.py", line 949, in wrapper
    return func(*args, **kwargs)
  File "nova/conductor/manager.py", line 618, in build_instances
    instance.uuid, request_spec)
  File "nova/scheduler/driver.py", line 67, in handle_schedule_error
    'task_state': None})
  File "nova/db/api.py", line 746, in instance_update_and_get_original
    columns_to_join=columns_to_join)
  File "nova/db/sqlalchemy/api.py", line 143, in wrapper
    return f(*args, **kwargs)
  File "nova/db/sqlalchemy/api.py", line 2282, in instance_update_and_get_original
    columns_to_join=columns_to_join)
  File "nova/db/sqlalchemy/api.py", line 2320, in _instance_update
    columns_to_join=columns_to_join)
  File "nova/db/sqlalchemy/api.py", line 1713, in _instance_get_by_uuid
    raise exception.InstanceNotFound(instance_id=uuid)

Tags: testing
Hans Lindgren (hanlind)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/122726

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Changed in nova:
assignee: Hans Lindgren (hanlind) → Mike Durnosvistov (mdurnosvistov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/122726
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Joe Gordon (jogo) wrote :

patch was abandoned, confirmed we are still seeing this, but only seeing this twice in subunit_log

Changed in nova:
assignee: Mike Durnosvistov (mdurnosvistov) → nobody
Revision history for this message
Joe Gordon (jogo) wrote :

but not seeing any unit tests take longer then 30 seconds

Changed in nova:
status: In Progress → Incomplete
Revision history for this message
Sean Dague (sdague) wrote :

I believe we actually addressed this upstream with new oslo.messaging release

Changed in nova:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.