Set migration status to 'error' instead of 'failed' during live-migration

Bug #1470420 reported by Rajesh Tailor on 2015-07-01
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Abhishek Kekane

Bug Description

In resize, confirm-resize and revert-resize operation, migration status
is marked as 'error' in case of failure for respective operation.

Migration object support is added in live-migration operation, which
mark migration status to 'failed' if live-migration operation fails in-between.

To make live-migration consistent with resize, confirm-resize and revert-resize operation, it needs to mark migration status to 'error' instead of 'failed' in case of failure.

Changed in nova:
assignee: nobody → Rajesh Tailor (rajesh-tailor)
description: updated
tags: added: live-migration

@Rajesh Tailor:

Since you are set as assignee, I switch the status to 'In Progress'.

Changed in nova:
status: New → In Progress
tags: added: live-migrate
removed: live-migration
Shuquan Huang (shuquan) wrote :

@rajesh, do you still work on this bug? If no, I'd like to take this bug. Thanks. :)

Rajesh Tailor (rajesh-tailor) wrote :

Hi Shuquan,

I am working on this bug and I will push patch for the same as soon as possible.

Shuquan Huang (shuquan) wrote :

Sure. Go ahead. :)

Changed in nova:
status: In Progress → Won't Fix
importance: Undecided → Low
status: Won't Fix → In Progress
Paul Murray (pmurray) on 2015-11-06
tags: added: live-migration
removed: live-migrate
Rajesh Tailor (rajesh-tailor) wrote :

Hi

Following are some approaches to solve this issue. Please suggest which would be the better way.

1) As suggested by Paul Murray, we can modify resize operation and set migration status to 'failed' on resize operation failure.
In this case, we need to modify periodic_task _cleanup_incomplete_migrations and add 'failed' status instead of 'error' in filter for migrations.

2) We can add new migration status 'cleaned', which will be set in periodic task _cleanup_incomplete_migrations.

We can filter migration status which are having 'error' or 'failed' status in periodic task _cleanup_incomplete_migrations and once instance files are deleted from compute node (either source or dest node) we can set newly added migration status 'cleaned' so that the same record is not filtered in subsequent periodic task run.

3) As suggested by Nikola Dipanov, it is reasonable to have retry logic in on self.driver.live_migration call. In that case, if retry logic is not successful (i.e. its unrecoverable situation) then ultimately migration status would be set to 'error' by _rollback_live_migration. But as of now, we don't have retry logic on live_migration driver call.

4) We can stick to the patch which is currently under review and replace migration status from 'failed' to 'error' wherever required.

Changed in nova:
assignee: Rajesh Tailor (rajesh-tailor) → Abhishek Kekane (abhishek-kekane)

Reviewed: https://review.openstack.org/215483
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d61e15818c1d108275b3286a6665fa3e6540e7e7
Submitter: Jenkins
Branch: master

commit d61e15818c1d108275b3286a6665fa3e6540e7e7
Author: Rajesh Tailor <email address hidden>
Date: Thu Jul 2 03:22:01 2015 -0700

    Set migration status to 'error' on live-migration failure

    (A) In resize, confirm-resize and revert-resize operation, migration status
    is marked as 'error' in case of failure for respective operation.

    Migration object support is added in live-migration operation, which mark
    migration status to 'failed' if live-migration operation fails in-between.

    To make live-migration consistent with resize, confirm-resize and revert-
    resize operation, it needs to mark migration status to 'error' instead of
    'failed' in case of failure.

    (B) Apart from consistency, proposed change fixes issue (similar to [1])
    which might occur on live-migration failure as follows:
    If live-migration fails (which sets migration status to 'failed') after
    copying instance files from source to dest node and then user request for
    instance deletion. In that case, delete api will only remove instance
    files from instance.host and not from other host (which could be either
    source or dest node but not instance.host). Since instance is already
    deleted, instance files will remain on other host (not instance.host).

    Set migration status to 'error' on live-migration failure, so that
    periodic task _cleanup_incomplete_migrations [2] will remove orphaned
    instance files from compute nodes after instance deletion in above case.

    [1] https://bugs.launchpad.net/nova/+bug/1392527
    [2] https://review.openstack.org/#/c/219299/

    DocImpact: On live-migration failure, set migration status to 'error'
    instead of 'failed'.

    Change-Id: I7a0c5a32349b0d3604802d22e83a3c2dab4b1370
    Closes-Bug: 1470420

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Reviewed: https://review.openstack.org/353851
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8825efa3b263f1334aa78c786b01c9dfdd3ad726
Submitter: Jenkins
Branch: stable/mitaka

commit 8825efa3b263f1334aa78c786b01c9dfdd3ad726
Author: Rajesh Tailor <email address hidden>
Date: Thu Jul 2 03:22:01 2015 -0700

    Set migration status to 'error' on live-migration failure

    (A) In resize, confirm-resize and revert-resize operation, migration status
    is marked as 'error' in case of failure for respective operation.

    Migration object support is added in live-migration operation, which mark
    migration status to 'failed' if live-migration operation fails in-between.

    To make live-migration consistent with resize, confirm-resize and revert-
    resize operation, it needs to mark migration status to 'error' instead of
    'failed' in case of failure.

    (B) Apart from consistency, proposed change fixes issue (similar to [1])
    which might occur on live-migration failure as follows:
    If live-migration fails (which sets migration status to 'failed') after
    copying instance files from source to dest node and then user request for
    instance deletion. In that case, delete api will only remove instance
    files from instance.host and not from other host (which could be either
    source or dest node but not instance.host). Since instance is already
    deleted, instance files will remain on other host (not instance.host).

    Set migration status to 'error' on live-migration failure, so that
    periodic task _cleanup_incomplete_migrations [2] will remove orphaned
    instance files from compute nodes after instance deletion in above case.

    [1] https://bugs.launchpad.net/nova/+bug/1392527
    [2] https://review.openstack.org/#/c/219299/

    DocImpact: On live-migration failure, set migration status to 'error'
    instead of 'failed'.

    Change-Id: I7a0c5a32349b0d3604802d22e83a3c2dab4b1370
    Closes-Bug: 1470420
    (cherry picked from commit d61e15818c1d108275b3286a6665fa3e6540e7e7)

tags: added: in-stable-mitaka

This issue was fixed in the openstack/nova 13.1.2 release.

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/247519
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers