nova raises "InstanceNotFound" when running the periodic reclaim

Bug #1587356 reported by Alvaro Lopez
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Undecided
Alvaro Lopez

Bug Description

Nova sometimes raises an exception when running the periodic instance deleted,
see below. The error is quite spurious, and it is not happening always so I
cannot get the reason why this is happening. The periodic reclaim is trying to
delete the instance, but the DB is not returning the instance as it is marked
as deleted.

    ERROR nova.compute.manager Traceback (most recent call last):
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 447, in _object_dispatch
    ERROR nova.compute.manager return getattr(target, method)(*args, **kwargs)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 213, in wrapper
    ERROR nova.compute.manager return fn(self, *args, **kwargs)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 673, in save
    ERROR nova.compute.manager columns_to_join=_expected_cols(expected_attrs))
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 765, in instance_update_and_get_original
    ERROR nova.compute.manager expected=expected)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 216, in wrapper
    ERROR nova.compute.manager return f(*args, **kwargs)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 146, in wrapper
    ERROR nova.compute.manager ectxt.value = e.inner_exc
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
    ERROR nova.compute.manager six.reraise(self.type_, self.value, self.tb)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 136, in wrapper
    ERROR nova.compute.manager return f(*args, **kwargs)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 2478, in instance_update_and_get_original
    ERROR nova.compute.manager session=session)
    ERROR nova.compute.manager
    ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1733, in _instance_get_by_uuid
    ERROR nova.compute.manager raise exception.InstanceNotFound(instance_id=uuid)
    ERROR nova.compute.manager
    ERROR nova.compute.manager InstanceNotFound: Instance aceb4514-915e-4687-ac99-20f6b7d5ed6d could not be found.
    ERROR nova.compute.manager
    ERROR nova.compute.manager
    WARNING nova.compute.manager [req-e62c7e7c-0799-4f87-8f06-a46cf5663733 - - - - -] [instance: aceb4514-915e-4687-ac99-20f6b7d5ed6d] Periodic reclaim failed to delete instance: Instance aceb4514-915e-4687-ac99-20f6b7d5ed6d could not be found.

As I said, this is an spurious error, but from time to time we get a bunch of
these errors and we need to manually delete those instances.

Alvaro Lopez (aloga)
Changed in nova:
assignee: nobody → Alvaro Lopez (aloga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/323250

Changed in nova:
status: New → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

You don't have the full trace, but what's probably happening here is that the code is lazy-loading attributes on the instance object but because the instance is deleted, it fails to lazy-load the full instance from the database because of the InstanceNotFound. The question is what's getting lazy-loaded and causing this to fail?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Also, what release are you hitting this on? Newton? master? Mitaka, other?

Revision history for this message
jichenjc (jichenjc) wrote :
Download full text (8.5 KiB)

copy from
http://paste.openstack.org/show/592513/

2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager Traceback (most recent call last):
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2358, in _delete_instance
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager self._shutdown_instance(context, instance, bdms)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2278, in _shutdown_instance
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager requested_networks)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager self.force_reraise()
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager six.reraise(self.type_, self.value, self.tb)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2266, in _shutdown_instance
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager block_device_info)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 816, in destroy
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager destroy_disks, migrate_data)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 926, in cleanup
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager instance.save()
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 207, in wrapper
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager ctxt, self, fn.__name__, args, kwargs)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 242, in object_action
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager objmethod=objmethod, args=args, kwargs=kwargs)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager retry=self.retry)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager timeout=timeout, retry=retry)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager retry=retry)
2016-05-30 13:23:03.335 30103 ERROR nova.compute.manager File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 461, in _send
2016-05...

Read more...

Revision history for this message
Alvaro Lopez (aloga) wrote :

Hi Matt,

this has happened at our cloud since several releases, IIRC: uno, Kilo, Liberty and now, Mitaka.

Revision history for this message
Alvaro Lopez (aloga) wrote :

Would it be possible that this bug gets some attention? We are still seeing it in any OpenStack version we have deployed up to now. The change (in Gerrit) is a trivial change that has been in the queue for 2 years: https://review.openstack.org/#/c/323250/

Revision history for this message
Alvaro Lopez (aloga) wrote :

I can't believe that this bug is taking more than 2 years to be fixed (well, actually merged, as the fix is already in Gerrit).

I actually doubt that this will be eventually merged, but this is a bug that, as I said, is hitting us since 2 years in any deployed OpenStack version up to date.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Alvaro, I understand the frustration but your change has also had review from a few cores, including myself, and the main question - how does an instance get into this state in the first place? - has not been answered so it's hard to move forward with fixing something that doesn't seem like it should be possible nor do we have any kind of recreate steps for it.

Revision history for this message
Alvaro Lopez (aloga) wrote :

Well, it is impossible for me to provide a method to reproduce this, as of today, October 2019 (3 years and almost a half later) this is not present anymore for new instances

For the sake of clarity: our DB is not mangled by means of any external tooling, so this was something caused by nova. I reckon that this is not present anymore for new instances (but we still do have several in the DB that are raising errors).

I would close the ticket as invalid and I will fix manually the DB instances.

Revision history for this message
Alvaro Lopez (aloga) wrote :

s/invalid/wont-fix/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Alvaro Lopez Garcia (<email address hidden>) on branch: master
Review: https://review.opendev.org/323250

Revision history for this message
Alvaro Lopez (aloga) wrote :

By the way, thank you for your time reviewing this again.

Changed in nova:
status: In Progress → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.