network-changed external_instance_event fails and traces hard if InstanceInfoCacheNotFound

Bug #1487025 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann

Bug Description

I saw this in a failed jenkins run with neutron:

http://logs.openstack.org/82/200382/6/check/gate-tempest-dsvm-neutron-full/cd9bfaa/logs/screen-n-cpu.txt.gz?level=TRACE#_2015-08-20_12_36_06_873

2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher executor_callback))
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher executor_callback)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/exception.py", line 89, in wrapped
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher payload)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/exception.py", line 72, in wrapped
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/compute/manager.py", line 6310, in external_instance_event
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher self.network_api.get_instance_nw_info(context, instance)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/network/base_api.py", line 244, in get_instance_nw_info
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher result = self._get_instance_nw_info(context, instance, **kwargs)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/network/neutronv2/api.py", line 873, in _get_instance_nw_info
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher compute_utils.refresh_info_cache_for_instance(context, instance)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/compute/utils.py", line 356, in refresh_info_cache_for_instance
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher instance.info_cache.refresh()
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 195, in wrapper
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher ctxt, self, fn.__name__, args, kwargs)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 248, in object_action
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher objmethod=objmethod, args=args, kwargs=kwargs)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher retry=self.retry)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher timeout=timeout, retry=retry)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher retry=retry)
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 422, in _send
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher raise result
2015-08-20 12:36:06.873 9235 ERROR oslo_messaging.rpc.dispatcher InstanceInfoCacheNotFound_Remote: Info cache for instance d9b950aa-be7c-40bd-93b4-d4d575ebdace could not be found.

Basically, the instance is deleted before the event is processed:

http://logs.openstack.org/82/200382/6/check/gate-tempest-dsvm-neutron-full/cd9bfaa/logs/screen-n-cpu.txt.gz#_2015-08-20_12_36_06_097

http://logs.openstack.org/82/200382/6/check/gate-tempest-dsvm-neutron-full/cd9bfaa/logs/screen-n-cpu.txt.gz#_2015-08-20_12_36_06_795

And things fail.

We should handle 404 type errors in the compute manager's external_instance_event method so that we don't fail to process all events when one fails in the loop.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/215170

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/215170
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=402b7efebb2cf9ce7759eb6b9c7e0cfaa7deb0e5
Submitter: Jenkins
Branch: master

commit 402b7efebb2cf9ce7759eb6b9c7e0cfaa7deb0e5
Author: Matt Riedemann <email address hidden>
Date: Thu Aug 20 07:01:32 2015 -0700

    Handle NotFound exceptions while processing network-changed events

    We could be getting a network-changed event from neutron in the case
    that something is being torn down on the networking side, in which case
    we're in a race on the nova side since we're probably deleting the
    instance and it's network info_cache.

    We should handle NotFound exceptions in that case so that we don't stop
    looping over all events and fail to process any others that come after
    that network-changed event.

    Closes-Bug: #1487025

    Change-Id: Icbc4ed9776de710c0102cf411f5c9808c1a05e83

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-3 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.