Set migration status to 'error' instead of 'failed' during live-migration

Bug #1470420 reported by Rajesh Tailor
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Abhishek Kekane

Bug Description

In resize, confirm-resize and revert-resize operation, migration status
is marked as 'error' in case of failure for respective operation.

Migration object support is added in live-migration operation, which
mark migration status to 'failed' if live-migration operation fails in-between.

To make live-migration consistent with resize, confirm-resize and revert-resize operation, it needs to mark migration status to 'error' instead of 'failed' in case of failure.

Changed in nova:
assignee: nobody → Rajesh Tailor (rajesh-tailor)
description: updated
tags: added: live-migration
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup

@Rajesh Tailor:

Since you are set as assignee, I switch the status to 'In Progress'.

Changed in nova:
status: New → In Progress
tags: added: live-migrate
removed: live-migration
Revision history for this message
Shuquan Huang (shuquan) wrote :

@rajesh, do you still work on this bug? If no, I'd like to take this bug. Thanks. :)

Revision history for this message
Rajesh Tailor (rajesh-tailor) wrote :

Hi Shuquan,

I am working on this bug and I will push patch for the same as soon as possible.

Revision history for this message
Shuquan Huang (shuquan) wrote :

Sure. Go ahead. :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/215483

Changed in nova:
status: In Progress → Won't Fix
importance: Undecided → Low
status: Won't Fix → In Progress
Paul Murray (pmurray)
tags: added: live-migration
removed: live-migrate
Revision history for this message
Rajesh Tailor (rajesh-tailor) wrote :

Hi

Following are some approaches to solve this issue. Please suggest which would be the better way.

1) As suggested by Paul Murray, we can modify resize operation and set migration status to 'failed' on resize operation failure.
In this case, we need to modify periodic_task _cleanup_incomplete_migrations and add 'failed' status instead of 'error' in filter for migrations.

2) We can add new migration status 'cleaned', which will be set in periodic task _cleanup_incomplete_migrations.

We can filter migration status which are having 'error' or 'failed' status in periodic task _cleanup_incomplete_migrations and once instance files are deleted from compute node (either source or dest node) we can set newly added migration status 'cleaned' so that the same record is not filtered in subsequent periodic task run.

3) As suggested by Nikola Dipanov, it is reasonable to have retry logic in on self.driver.live_migration call. In that case, if retry logic is not successful (i.e. its unrecoverable situation) then ultimately migration status would be set to 'error' by _rollback_live_migration. But as of now, we don't have retry logic on live_migration driver call.

4) We can stick to the patch which is currently under review and replace migration status from 'failed' to 'error' wherever required.

Changed in nova:
assignee: Rajesh Tailor (rajesh-tailor) → Abhishek Kekane (abhishek-kekane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/215483
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d61e15818c1d108275b3286a6665fa3e6540e7e7
Submitter: Jenkins
Branch: master

commit d61e15818c1d108275b3286a6665fa3e6540e7e7
Author: Rajesh Tailor <email address hidden>
Date: Thu Jul 2 03:22:01 2015 -0700

    Set migration status to 'error' on live-migration failure

    (A) In resize, confirm-resize and revert-resize operation, migration status
    is marked as 'error' in case of failure for respective operation.

    Migration object support is added in live-migration operation, which mark
    migration status to 'failed' if live-migration operation fails in-between.

    To make live-migration consistent with resize, confirm-resize and revert-
    resize operation, it needs to mark migration status to 'error' instead of
    'failed' in case of failure.

    (B) Apart from consistency, proposed change fixes issue (similar to [1])
    which might occur on live-migration failure as follows:
    If live-migration fails (which sets migration status to 'failed') after
    copying instance files from source to dest node and then user request for
    instance deletion. In that case, delete api will only remove instance
    files from instance.host and not from other host (which could be either
    source or dest node but not instance.host). Since instance is already
    deleted, instance files will remain on other host (not instance.host).

    Set migration status to 'error' on live-migration failure, so that
    periodic task _cleanup_incomplete_migrations [2] will remove orphaned
    instance files from compute nodes after instance deletion in above case.

    [1] https://bugs.launchpad.net/nova/+bug/1392527
    [2] https://review.openstack.org/#/c/219299/

    DocImpact: On live-migration failure, set migration status to 'error'
    instead of 'failed'.

    Change-Id: I7a0c5a32349b0d3604802d22e83a3c2dab4b1370
    Closes-Bug: 1470420

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 14.0.0.0b2

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/353851

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/353851
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8825efa3b263f1334aa78c786b01c9dfdd3ad726
Submitter: Jenkins
Branch: stable/mitaka

commit 8825efa3b263f1334aa78c786b01c9dfdd3ad726
Author: Rajesh Tailor <email address hidden>
Date: Thu Jul 2 03:22:01 2015 -0700

    Set migration status to 'error' on live-migration failure

    (A) In resize, confirm-resize and revert-resize operation, migration status
    is marked as 'error' in case of failure for respective operation.

    Migration object support is added in live-migration operation, which mark
    migration status to 'failed' if live-migration operation fails in-between.

    To make live-migration consistent with resize, confirm-resize and revert-
    resize operation, it needs to mark migration status to 'error' instead of
    'failed' in case of failure.

    (B) Apart from consistency, proposed change fixes issue (similar to [1])
    which might occur on live-migration failure as follows:
    If live-migration fails (which sets migration status to 'failed') after
    copying instance files from source to dest node and then user request for
    instance deletion. In that case, delete api will only remove instance
    files from instance.host and not from other host (which could be either
    source or dest node but not instance.host). Since instance is already
    deleted, instance files will remain on other host (not instance.host).

    Set migration status to 'error' on live-migration failure, so that
    periodic task _cleanup_incomplete_migrations [2] will remove orphaned
    instance files from compute nodes after instance deletion in above case.

    [1] https://bugs.launchpad.net/nova/+bug/1392527
    [2] https://review.openstack.org/#/c/219299/

    DocImpact: On live-migration failure, set migration status to 'error'
    instead of 'failed'.

    Change-Id: I7a0c5a32349b0d3604802d22e83a3c2dab4b1370
    Closes-Bug: 1470420
    (cherry picked from commit d61e15818c1d108275b3286a6665fa3e6540e7e7)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/247519
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.