InstanceNotFound stacktrace in _process_instance_vif_deleted_event

Bug #1645464 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Newton
Fix Committed
Medium
Matt Riedemann

Bug Description

Seen in a CI run here:

http://logs.openstack.org/16/393416/6/check/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-xenial/44f3329/logs/screen-n-cpu.txt.gz?level=TRACE#_2016-11-28_17_34_36_535

2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server [req-8d784631-28c2-41b4-8175-7cfb59480838 nova service] Exception during message handling
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 225, in dispatch
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 195, in _do_dispatch
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/exception_wrapper.py", line 75, in wrapped
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server self.force_reraise()
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/exception_wrapper.py", line 66, in wrapped
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 6800, in external_instance_event
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server event.tag)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/compute/manager.py", line 6755, in _process_instance_vif_deleted_event
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server network_info = instance.info_cache.network_info
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 67, in getter
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server self.obj_load_attr(name)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/instance.py", line 1063, in obj_load_attr
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server self._load_generic(attrname)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/instance.py", line 819, in _load_generic
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server expected_attrs=[attrname])
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 177, in wrapper
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server args, kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/conductor/rpcapi.py", line 236, in object_class_action_versions
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server args=args, kwargs=kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 169, in call
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server retry=self.retry)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 97, in _send
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server timeout=timeout, retry=retry)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 467, in send
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server retry=retry)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in _send
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server raise result
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server InstanceNotFound_Remote: Instance f2cbed80-0516-4c2a-b575-24b9885ec189 could not be found.
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/conductor/manager.py", line 87, in _object_dispatch
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return getattr(target, method)(*args, **kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/instance.py", line 463, in get_by_uuid
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server use_slave=use_slave)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 226, in wrapper
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/instance.py", line 455, in _db_instance_get_by_uuid
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server columns_to_join=columns_to_join)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/db/api.py", line 725, in instance_get_by_uuid
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return IMPL.instance_get_by_uuid(context, uuid, columns_to_join)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 170, in wrapper
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 271, in wrapped
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server return f(context, *args, **kwargs)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1885, in instance_get_by_uuid
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server columns_to_join=columns_to_join)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 1894, in _instance_get_by_uuid
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server raise exception.InstanceNotFound(instance_id=uuid)
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server InstanceNotFound: Instance f2cbed80-0516-4c2a-b575-24b9885ec189 could not be found.
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server
2016-11-28 17:34:36.535 2361 ERROR oslo_messaging.rpc.server

We have a race when deleting an instance and deleting the ports attached to that instance. Neutron sends the vif-deleted event to nova and nova tries to refresh the InstanceInfoCache.network_info field but the instance is already deleted so we fail and log a stacktrace. We should handle the InstanceNotFound so we don't stacktrace in the n-cpu logs.

Tags: compute
Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/403925

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/403925
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4fce45f83bd510202fd1cbcdb7694361c41e9400
Submitter: Jenkins
Branch: master

commit 4fce45f83bd510202fd1cbcdb7694361c41e9400
Author: Matt Riedemann <email address hidden>
Date: Mon Nov 28 17:07:51 2016 -0500

    Pre-load info_cache when handling external events and handle NotFound

    Before change a5b920a197c70d2ae08a1e1335d979857f923b4f we'd join the
    info_cache column when getting the instance in the API. Without
    joining the info_cache column when getting the instance, that has
    to be lazy-loaded on the compute when processing an external
    neutron event, like network-vif-deleted. So this change pre-loads
    the info_cache in the API again as an optimization.

    There is also a race to contend with here when deleting an instance.
    Neutron can send the network-vif-deleted event after we've already
    marked the instance as deleted in the database, at which point
    lazy-loading info_cache (or updating InstanceInfoCache for that
    matter), can result in an Instance(InfoCache)NotFound error, so this
    change handles that also.

    Change-Id: Ia3b4288691b392d56324e9d13c92e8e0b0d81e76
    Closes-Bug: #1645464

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/427732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0rc1

This issue was fixed in the openstack/nova 15.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/427732
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d70d02f4af78ec9026cf172b8bcd11b7c451f17c
Submitter: Jenkins
Branch: stable/newton

commit d70d02f4af78ec9026cf172b8bcd11b7c451f17c
Author: Matt Riedemann <email address hidden>
Date: Mon Nov 28 17:07:51 2016 -0500

    Pre-load info_cache when handling external events and handle NotFound

    Before change a5b920a197c70d2ae08a1e1335d979857f923b4f we'd join the
    info_cache column when getting the instance in the API. Without
    joining the info_cache column when getting the instance, that has
    to be lazy-loaded on the compute when processing an external
    neutron event, like network-vif-deleted. So this change pre-loads
    the info_cache in the API again as an optimization.

    There is also a race to contend with here when deleting an instance.
    Neutron can send the network-vif-deleted event after we've already
    marked the instance as deleted in the database, at which point
    lazy-loading info_cache (or updating InstanceInfoCache for that
    matter), can result in an Instance(InfoCache)NotFound error, so this
    change handles that also.

    Change-Id: Ia3b4288691b392d56324e9d13c92e8e0b0d81e76
    Closes-Bug: #1645464
    (cherry picked from commit 4fce45f83bd510202fd1cbcdb7694361c41e9400)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.5

This issue was fixed in the openstack/nova 14.0.5 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.