cleanup running deleted instance with reap failed with none token context

Bug #1734025 reported by Li Xipeng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Li Xipeng
Pike
Fix Committed
Medium
huanhongda

Bug Description

Description

When zombied instances appear(You can also see bug https://bugs.launchpad.net/nova/+bug/911366),
set running_deleted_instance_poll_interval = 60 and running_deleted_instance_action = reap, then nova-compute service will clear those zombied instances, but if those instances is boot from volume or had volumes attached. After clear, zombied instances cleared, but volumes with attached status exist, and if those volumes are bootable and used to boot volume and set deleted_on_termination=True, thoses volume will still exist and in attached status but instance did not exist.

Steps to reproduce

1. set running_deleted_instance_poll_interval=60 and running_deleted_instance_action = reap.
2. update an running instance status to deleted.
3. restart nova-compute service and wait 60 seconds.

Expected result

Previous test bootable volume was deleted and volumes attached to zombied instances ware detached.

Actual result

Previous test bootable volume was in state attached and in-use, volumes attached to zombied instances ware in-use and attached to those zombied instances.

Li Xipeng (lixipeng)
Changed in nova:
status: New → In Progress
assignee: nobody → Li Xipeng (lixipeng)
Changed in nova:
assignee: Li Xipeng (lixipeng) → Matt Riedemann (mriedem)
Jay Pipes (jaypipes)
summary: - clearup running deleted instance with reap failed with none token
+ cleanup running deleted instance with reap failed with none token
context
Revision history for this message
Jay Pipes (jaypipes) wrote :

does this actually happen for non-boot-from-volume instances that have volumes attached?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/522112
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ca6daf148debb9c9646fcf6db9660c830da5a594
Submitter: Zuul
Branch: master

commit ca6daf148debb9c9646fcf6db9660c830da5a594
Author: lixipeng <email address hidden>
Date: Wed Nov 22 12:03:58 2017 +0800

    Fix bug case by none token context

    When set reclaim_instance_interval > 0, and then delete an
    instance which booted from volume with `delete_on_termination`
    set as true. After reclaim_instance_interval time pass,
    all volumes boot instance will with state: attached and in-use,
    but attached instances was deleted.

    This bug case as admin context from
    `nova.compute.manager._reclaim_queued_deletes` did not have
    any token info, then call cinder api would be failed.

    So add user/project CONF with admin role at cinder group,
    and when determine context is_admin and without token, do
    authenticaion with user/project info to call cinder api.

    Change-Id: I3c35bba43fee81baebe8261f546c1424ce3a3383
    Closes-Bug: #1733736
    Closes-Bug: #1734025
    Partial-Bug: #1736773

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0rc1

This issue was fixed in the openstack/nova 17.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/603044

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Li Xipeng (lixipeng)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/603044
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4d7148709c5de098141fbee12ad2e78c61e3b174
Submitter: Zuul
Branch: stable/pike

commit 4d7148709c5de098141fbee12ad2e78c61e3b174
Author: lixipeng <email address hidden>
Date: Wed Nov 22 12:03:58 2017 +0800

    Fix bug case by none token context

    When set reclaim_instance_interval > 0, and then delete an
    instance which booted from volume with `delete_on_termination`
    set as true. After reclaim_instance_interval time pass,
    all volumes boot instance will with state: attached and in-use,
    but attached instances was deleted.

    This bug case as admin context from
    `nova.compute.manager._reclaim_queued_deletes` did not have
    any token info, then call cinder api would be failed.

    So add user/project CONF with admin role at cinder group,
    and when determine context is_admin and without token, do
    authenticaion with user/project info to call cinder api.

    Conflicts:
        nova/volume/cinder.py
        nova/tests/unit/test_cinder.py

    NOTE(huanhongda): The conflict is due to not having change
    Ifc01dbf98545104c998ab96f65ff8623a6db0f28 in Pike.

    Change-Id: I3c35bba43fee81baebe8261f546c1424ce3a3383
    Closes-Bug: #1733736
    Closes-Bug: #1734025
    Partial-Bug: #1736773
    (cherry picked from commit ca6daf148debb9c9646fcf6db9660c830da5a594)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.