When source compute service up, will not destroy and clean up those instances which be evacuated then be deleted.

Bug #1745977 reported by jiangyuhao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Dan Smith
Ocata
Fix Committed
Medium
Matt Riedemann
Pike
Fix Committed
Medium
Matt Riedemann
Queens
Fix Committed
Medium
Matt Riedemann

Bug Description

Description
===========
When the instance evacuated to destination host successfully, then deleted this instance. The source host up will cleanup this instance failed.

Steps to reproduce
==================
1.deploy a local instance in source host.
2.power off the source host.
3.evacuate the instance to destination host.
4.delete this instance.
5.power on the source host.

Expected result
===============
The source host nova-compute service cleanup this evacuated and deleted instance.

Actual result
=============
This instance still on source host.

Environment
===========
Openstack Pike
Libvirt + KVM
ovs network

Logs & Configs
==============
source host nova-compute log:

2018-01-29 10:28:48.664 9364 ERROR oslo_service.service [req-7bdfe28f-0464-4af8-bdd0-2d433b25d84a - - - - -] Error starting thread.: InstanceNotFound_Remote: Instance 19022200-7abc-423d-90bd-e9dcd0887679 could not be found.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 125, in _object_dispatch
    return getattr(target, method)(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper
    result = fn(cls, context, *args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 474, in get_by_uuid
    use_slave=use_slave)

  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 235, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 466, in _db_instance_get_by_uuid
    columns_to_join=columns_to_join)

  File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 744, in instance_get_by_uuid
    return IMPL.instance_get_by_uuid(context, uuid, columns_to_join)

  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 179, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 280, in wrapped
    return f(context, *args, **kwargs)

  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1911, in instance_get_by_uuid
    columns_to_join=columns_to_join)

  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1920, in _instance_get_by_uuid
    raise exception.InstanceNotFound(instance_id=uuid)

InstanceNotFound: Instance 19022200-7abc-423d-90bd-e9dcd0887679 could not be found.
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service Traceback (most recent call last):
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 721, in run_service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service service.start()
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/service.py", line 156, in start
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service self.manager.init_host()
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1173, in init_host
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service self._destroy_evacuated_instances(context)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 691, in _destroy_evacuated_instances
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service bdi, destroy_disks)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 909, in destroy
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service destroy_disks)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1032, in cleanup
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service attempts = int(instance.system_metadata.get('clean_attempts',
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 67, in getter
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service self.obj_load_attr(name)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 1131, in obj_load_attr
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service self._load_generic(attrname)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 858, in _load_generic
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service expected_attrs=[attrname])
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 177, in wrapper
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service args, kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service args=args, kwargs=kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 167, in call
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service retry=self.retry)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 123, in _send
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service timeout=timeout, retry=retry)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 566, in send
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service retry=retry)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 557, in _send
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service raise result
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service InstanceNotFound_Remote: Instance 19022200-7abc-423d-90bd-e9dcd0887679 could not be found.
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service Traceback (most recent call last):
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 125, in _object_dispatch
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service return getattr(target, method)(*args, **kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service result = fn(cls, context, *args, **kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 474, in get_by_uuid
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service use_slave=use_slave)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 235, in wrapper
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service return f(*args, **kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 466, in _db_instance_get_by_uuid
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service columns_to_join=columns_to_join)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 744, in instance_get_by_uuid
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service return IMPL.instance_get_by_uuid(context, uuid, columns_to_join)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 179, in wrapper
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service return f(*args, **kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 280, in wrapped
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service return f(context, *args, **kwargs)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1911, in instance_get_by_uuid
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service columns_to_join=columns_to_join)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1920, in _instance_get_by_uuid
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service raise exception.InstanceNotFound(instance_id=uuid)
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service
2018-01-29 10:28:48.664 9364 ERROR oslo_service.service InstanceNotFound: Instance 19022200-7abc-423d-90bd-e9dcd0887679 could not be found.

jiangyuhao (jiang-yuhao)
Changed in nova:
milestone: none → ongoing
tags: added: evacuate openstack-version.pike
Changed in nova:
milestone: ongoing → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/543970

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: New → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

It looks like this was a regression caused by this change, which was backported to stable/ocata:

https://review.openstack.org/#/q/Ib5f6b03189b7fc5cd0b226ea2dca74865fbef12a

Where we used to not get deleted migrations records when the source compute host comes back up, but now we do.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/545987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/545988

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/545989

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/543970
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6ba8a35825a7ec839b2d0aab7559351d573130ab
Submitter: Zuul
Branch: master

commit 6ba8a35825a7ec839b2d0aab7559351d573130ab
Author: Dan Smith <email address hidden>
Date: Tue Feb 13 07:16:57 2018 -0800

    Lazy-load instance attributes with read_deleted=yes

    If we're doing a lazy-load of a generic attribute on instance, we
    should be using read_deleted=yes. Otherwise we just fail in the load
    process which is confusing and not helpful to a cleanup routine that
    needs to handle the deleted instance. This makes us load those things
    with read_deleted=yes.

    Change-Id: Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0
    Closes-Bug: #1745977

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/545987
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=619754f5c836ed1b58c807138836e6cf5a4e6904
Submitter: Zuul
Branch: stable/queens

commit 619754f5c836ed1b58c807138836e6cf5a4e6904
Author: Dan Smith <email address hidden>
Date: Tue Feb 13 07:16:57 2018 -0800

    Lazy-load instance attributes with read_deleted=yes

    If we're doing a lazy-load of a generic attribute on instance, we
    should be using read_deleted=yes. Otherwise we just fail in the load
    process which is confusing and not helpful to a cleanup routine that
    needs to handle the deleted instance. This makes us load those things
    with read_deleted=yes.

    Change-Id: Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0
    Closes-Bug: #1745977
    (cherry picked from commit 6ba8a35825a7ec839b2d0aab7559351d573130ab)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0rc3

This issue was fixed in the openstack/nova 17.0.0.0rc3 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/545989
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e0c1d461af0701adb94e6974f363e12395ed0162
Submitter: Zuul
Branch: stable/ocata

commit e0c1d461af0701adb94e6974f363e12395ed0162
Author: Dan Smith <email address hidden>
Date: Tue Feb 13 07:16:57 2018 -0800

    Lazy-load instance attributes with read_deleted=yes

    If we're doing a lazy-load of a generic attribute on instance, we
    should be using read_deleted=yes. Otherwise we just fail in the load
    process which is confusing and not helpful to a cleanup routine that
    needs to handle the deleted instance. This makes us load those things
    with read_deleted=yes.

    Change-Id: Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0
    Closes-Bug: #1745977
    (cherry picked from commit 6ba8a35825a7ec839b2d0aab7559351d573130ab)
    (cherry picked from commit 619754f5c836ed1b58c807138836e6cf5a4e6904)
    (cherry picked from commit 1407079d4008c6304799dd83f5bf4ba505d8e438)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/545988
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1407079d4008c6304799dd83f5bf4ba505d8e438
Submitter: Zuul
Branch: stable/pike

commit 1407079d4008c6304799dd83f5bf4ba505d8e438
Author: Dan Smith <email address hidden>
Date: Tue Feb 13 07:16:57 2018 -0800

    Lazy-load instance attributes with read_deleted=yes

    If we're doing a lazy-load of a generic attribute on instance, we
    should be using read_deleted=yes. Otherwise we just fail in the load
    process which is confusing and not helpful to a cleanup routine that
    needs to handle the deleted instance. This makes us load those things
    with read_deleted=yes.

    Change-Id: Ide6cc5bb1fce2c9aea9fa3efdf940e8308cd9ed0
    Closes-Bug: #1745977
    (cherry picked from commit 6ba8a35825a7ec839b2d0aab7559351d573130ab)
    (cherry picked from commit 619754f5c836ed1b58c807138836e6cf5a4e6904)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.1

This issue was fixed in the openstack/nova 15.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/575190
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=604819b29c0bd43969747d32f6e3d818b3cbece7
Submitter: Zuul
Branch: master

commit 604819b29c0bd43969747d32f6e3d818b3cbece7
Author: Dan Smith <email address hidden>
Date: Wed Jun 13 11:14:37 2018 -0700

    Always read-deleted=yes on lazy-load

    For some reason we were only reading deleted instances when loading generic
    fields and not things like flavor. That weird behavior isn't very helpful,
    so this makes us always read deleted for that case. Some of the fields, like
    tags, will short-circuit that and just immediately lazy-load an empty set.
    But for anything else, we should allow reading that data if it's still there.

    With this change, we are able to remove a specific read_deleted='yes' usage
    from ComputeManager._destroy_evacuated_instances() which is handled with
    the generic solution. TestEvacuateDeleteServerRestartOriginalCompute asserts
    that the evacuate scenario is still fixed.

    Related-Bug: #1794996
    Related-Bug: #1745977

    Change-Id: I8ec3a3a697e55941ee447d0b52d29785717e4bf0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.