Hyper-V: failed to destroy instance

Bug #1461970 reported by Lucian Petrut
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Adelina Tuvenie

Bug Description

In some cases, Hyper-V fails to destroy instances, returning a 32775 error code, meaning "Invalid state for this operation". Right before this, the instance is reported as successfully being shut off.

This is a quite serious bug as it can lead to leaked instances.

Trace: http://paste.openstack.org/show/262589/

Tags: hyper-v
Changed in nova:
assignee: nobody → Lucian Petrut (petrutlucian94)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/188408

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/188859

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/204007

Changed in nova:
assignee: Lucian Petrut (petrutlucian94) → Adelina Tuvenie (atuvenie)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Petrut Lucian (<email address hidden>) on branch: master
Review: https://review.openstack.org/188408

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/188859
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6d8903b5d9609305d63aba0067a032a66f5ce3ee
Submitter: Jenkins
Branch: master

commit 6d8903b5d9609305d63aba0067a032a66f5ce3ee
Author: Lucian Petrut <email address hidden>
Date: Fri Jun 5 19:46:43 2015 +0300

    Hyper-V: Lock snapshot operation using instance uuid

    The instance snapshot operation is not synchronized withing the
    compute manager.

    Attempting to destroy an instance while it's being snapshoted fails
    when using Hyper-V. This can lead to leaked instances.

    In order to avoid this, this patch synchronizes the instance
    snapshot operation within the Hyper-V driver.

    The alternative would be canceling pending WMI jobs when instance
    termination is requested, which we consider introducing as a later
    effort.

    Change-Id: I1ad3f795925503e59bf51ee3148871a88e9eecf0
    Partial-Bug: #1461970

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/212482

Changed in nova:
assignee: Adelina Tuvenie (atuvenie) → Cale Rath (ctrath)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Alessandro Pilotti (alexpilotti) wrote :

This happens when a destroy operation is performed while another operation is in progress, due to the fact that the Nova manager executes destroy actions without a lock on the instance.

The resulting race condition is in particular visible in Tempest runs.

The resulting status is a deleted instance as expected, so the error can be ignored if a destroy operation is in progress.
The alternative would be reinstating athe instance lock during destroy.

Changed in nova:
assignee: Cale Rath (ctrath) → Adelina Tuvenie (atuvenie)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/204007
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=60d89ab2339f751e53bf4d440adca8c0b021cd60
Submitter: Jenkins
Branch: master

commit 60d89ab2339f751e53bf4d440adca8c0b021cd60
Author: Adelina Tuvenie <email address hidden>
Date: Tue Jul 21 02:46:47 2015 -0700

    Fixes Bug "destroy_vm fails with HyperVException"

    When trying to delete an instance that has a pending operation,
    i.e. creating a snapshot, Hyper-V will raise the following excention:

    HyperVException: Operation failed with return value: 32775

    Code 32775 means "Invalid state for this operation". This means
    that the delete operation cannot be performed while there is
    another operation pending.

    This patch fixes this problem by requesting all the instance jobs
    and killing them right before the delete operation.

    Closes Bug: #1461970

    Change-Id: I0e3fca20981e1080814224b8cee7cea5b8c6a53f

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-rc1 → 12.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/kilo)

Change abandoned by Claudiu Belu (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/212482
Reason: not critical enough for kilo.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.