instance data resides on destination node when vm is deleted during live-migration

Bug #1285000 reported by Abhishek Kekane on 2014-02-26
80
This bug affects 15 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Maciej Jozefczyk
Newton
Medium
Maciej Jozefczyk
Ocata
Medium
Maciej Jozefczyk
Pike
Medium
Maciej Jozefczyk

Bug Description

If the VM is deleted during live-migration process, there is possibility that the instance data residing on the destination compute node is not deleted.
Please refer to http://paste.openstack.org/show/69730/ reproduce the issue.

IMO, One of the possible solution is to restrict the user from deleting the VM when live-migration is in progress.

summary: - Delete a VM when live-migration is in progress
+ instance data resides on destination node when vm is deleted during
+ live-migration
description: updated
Michael Still (mikal) on 2014-03-13
tags: added: compute live-migration
melanie witt (melwitt) on 2014-03-16
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Simon Chang (changsimon) on 2014-03-25
Changed in nova:
assignee: nobody → Simon Chang (changsimon)
Simon Chang (changsimon) on 2014-05-29
Changed in nova:
assignee: Simon Chang (changsimon) → nobody
Takahiro Shida (shida) wrote :

IMO, suggest some solution for this problem.
1. If the delete request occur, live-migration process start to cancel.
The live-migration process check the instance status, and if the status change to deleting, live-migration process call the rollback_migration in places.

2. Block the delete request until migration end.
The delete request suspend in nova-api if the instance status was MIGRATING.

Umm... taskflow solve this problem more smart, maybe

Fix proposed to branch: master
Review: https://review.openstack.org/153449

Changed in nova:
assignee: nobody → Yasuaki Nagata (yasuaki-nagata)
status: Confirmed → In Progress
melanie witt (melwitt) on 2015-04-20
tags: removed: live-migration ntt

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/153449
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Solution to this problem is a little bit different since live migration monitor has been merged. Instead of periodic task I think of monitor improvement.

When user trigger delete of a VM that is being live migrated, live migration monitor will think that LM was eventually completed and it will call "post_method" on destination host (because VM disappeared from source host). Depending on scenario during post_method VM will be in DELETING state or it will not exist. This will cause exceptions on destination host and will leave things in a messy state - rollback will not be called.

So I think that this check in live_migration_monitor:

if ex.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN:
                            LOG.debug("VM is missing, migration finished",
                                      instance=instance)
                            info.type = libvirt.VIR_DOMAIN_JOB_COMPLETED

should also check whether VM still exists in Nova. If not - it should set info.type to one of two: libvirt.VIR_DOMAIN_JOB_FAILED or libvirt.VIR_DOMAIN_JOB_CANCELLED so that rollback will be called on destination host.

After this fix I believe that such situation will be a corner case and will happen only if nova-compute will be restarted during LM, but this should be fixed by blueprint manager-restart-during-migration
https://blueprints.launchpad.net/nova/+spec/manager-restart-during-migration

tags: added: live-migrate

Yasuaki do you still work on this? I would like to propose a new fix.

Changed in nova:
assignee: Yasuaki Nagata (yasuaki-nagata) → Bartosz Fic (bartosz-fic)
Changed in nova:
assignee: Bartosz Fic (bartosz-fic) → Pawel Koniszewski (pawel-koniszewski)
Paul Murray (pmurray) on 2015-11-06
tags: added: live-migration
removed: live-migrate
Changed in nova:
assignee: Pawel Koniszewski (pawel-koniszewski) → nobody
status: In Progress → Confirmed
Changed in nova:
assignee: nobody → Maciej Szankin (mszankin)
stgleb (gstepanov) wrote :

@mszankin are you still working on that issue?

Maciej Szankin (mszankin) wrote :

@gstepanov: Was off for 3 weeks, I plan to continue on this task. Is it a priority for you and want to take over?

Solving an inconsistency: This bug report has an assignee and it looks
like this could result in a patch. Therefore I switch the status to
"In Progress".
Dear assignee, please provide a (WIP) patch in the next 2 weeks. If you
stop working on this report, please remove yourself as assignee and
switch the status back. If you need assistance, reach out on the
IRC channel #openstack-nova or use the mailing list.

Changed in nova:
status: Confirmed → In Progress
Maciej Szankin (mszankin) wrote :

No solution yet and I am going offline for at least a week - removing myself for now.

Changed in nova:
assignee: Maciej Szankin (mszankin) → nobody

Solving an inconsistency: The status of a bug report is "in progress" with no assignee. Hence, changing status from "In progress" to "Confirmed".

Changed in nova:
status: In Progress → Confirmed

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/185958
Reason: This code hasn't been updated in a long time, and is in merge conflict. I am going to abandon this review, but feel free to restore it if you're still working on this.

CONFIRMED FOR: Newton

Changed in nova:
status: Expired → In Progress
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)

I belive issue still exists for master also.
IMO it could be fixed with two ways:
1) Recheck and modify Bartosz and Pawel change in monitor.

2) Improve _cleanup_running_deleted_instances period (already existing) task to search for deleted instances but without host filter (modify _running_deleted_instances).

Reviewed: https://review.openstack.org/491808
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aba3f649323d328f8122d372dea38256dfed6a52
Submitter: Jenkins
Branch: master

commit aba3f649323d328f8122d372dea38256dfed6a52
Author: Maciej Józefczyk <email address hidden>
Date: Tue Aug 8 15:37:26 2017 +0200

    Remove host filter for _cleanup_running_deleted_instances periodic task

    Periodic task _cleanup_running_deleted_instances() looks for orphaned
    and running instances on hypervisor that should be deleted.
    The problem is it checks if running instance has the same
    hypervisor defined as it is in nova database.

    In bug #1285000 it has been found that removing instance during
    its migration could lead to abandon instance files on destination
    host.

    This change removes host filter in _running_deleted_instances() to
    find also orphaned instances that are running on 'post migration'
    destination host.

    Change-Id: Idd1b58b85329b8e021eba4bc27f577af1b3338f4
    Partial-Bug: #1285000

Matt Riedemann (mriedem) wrote :

Marked newton as won't fix for this since we're past the support window for newton on non-critical fixes like this one.

https://docs.openstack.org/project-team-guide/stable-branches.html

Changed in nova:
importance: Undecided → Medium
importance: Medium → High
importance: High → Medium

Reviewed: https://review.openstack.org/494973
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c95f00aebf905c5d9c9fd55a6247b4e0a52d0485
Submitter: Jenkins
Branch: stable/pike

commit c95f00aebf905c5d9c9fd55a6247b4e0a52d0485
Author: Maciej Józefczyk <email address hidden>
Date: Tue Aug 8 15:37:26 2017 +0200

    Remove host filter for _cleanup_running_deleted_instances periodic task

    Periodic task _cleanup_running_deleted_instances() looks for orphaned
    and running instances on hypervisor that should be deleted.
    The problem is it checks if running instance has the same
    hypervisor defined as it is in nova database.

    In bug #1285000 it has been found that removing instance during
    its migration could lead to abandon instance files on destination
    host.

    This change removes host filter in _running_deleted_instances() to
    find also orphaned instances that are running on 'post migration'
    destination host.

    Change-Id: Idd1b58b85329b8e021eba4bc27f577af1b3338f4
    Partial-Bug: #1285000
    (cherry picked from commit aba3f649323d328f8122d372dea38256dfed6a52)

tags: added: in-stable-pike

Change abandoned by Maciej Jozefczyk (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/494975
Reason: Ok, so leaving this now.

Reviewed: https://review.openstack.org/494974
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b03f548765faa2d5b609874506b714158d772d56
Submitter: Jenkins
Branch: stable/ocata

commit b03f548765faa2d5b609874506b714158d772d56
Author: Maciej Józefczyk <email address hidden>
Date: Tue Aug 8 15:37:26 2017 +0200

    Remove host filter for _cleanup_running_deleted_instances periodic task

    Periodic task _cleanup_running_deleted_instances() looks for orphaned
    and running instances on hypervisor that should be deleted.
    The problem is it checks if running instance has the same
    hypervisor defined as it is in nova database.

    In bug #1285000 it has been found that removing instance during
    its migration could lead to abandon instance files on destination
    host.

    This change removes host filter in _running_deleted_instances() to
    find also orphaned instances that are running on 'post migration'
    destination host.

    Change-Id: Idd1b58b85329b8e021eba4bc27f577af1b3338f4
    Partial-Bug: #1285000
    (cherry picked from commit aba3f649323d328f8122d372dea38256dfed6a52)

tags: added: in-stable-ocata
Matt Riedemann (mriedem) on 2017-10-04
Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers