Failed Live Block Migration leaves with Inconsistent Instance Status

Bug #1051881 reported by Mate Lakat
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Jian Wen
Folsom
Fix Released
High
Nikola Đipanov

Bug Description

I have a master and slave devstack installation. As I issued a live block migrate to an invalid host, the status of the Instance was changed to "Migrating"

CONFIGURATION:
stack@DevStackOSDomU:~$ less /etc/nova/nova.conf | grep compute_driver
compute_driver=xenapi.XenAPIDriver

STEPS TO REPRODUCE:

1.) You have a server
stack@DevStackOSDomU:~/devstack$ nova list
+--------------------------------------+------+--------+------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+--------+------------------+
| 083a1d11-22f9-4e32-b476-21412b8bd725 | asd | ACTIVE | private=10.0.0.2 |

2.) And multiple compute resources
stack@DevStackOSDomU:~/devstack$ nova-manage service list
Binary Host Zone Status State
nova-compute DevStackOSDomU nova enabled :-)
...
nova-compute DevStackComputeSlave nova enabled :-)

3.) Server is running on DevStackOSDomU:
+-------------------------------------+------------------...
| Property | Value
+-------------------------------------+------------------...
...
| OS-EXT-SRV-ATTR:host | DevStackOSDomU
...

4.) You live migrate the server to an invalid host:
stack@DevStackOSDomU:~/devstack$ nova live-migration 083a1d11-22f9-4e32-b476-21412b8bd725 SomeHostThatDoesNotExist --block-migrate

5.) You get an error as expected:
...
ComputeHostNotFound: Compute host SomeHostThatDoesNotExist could not be found.

 (HTTP 400) (Request-ID: req-0dbf7ad3-c45b-4924-aa3d-9b62edd95fd0)

6.) But the servers state is changed to "MIGRATING":
stack@DevStackOSDomU:~/devstack$ nova list
+--------------------------------------+------+-----------+------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------+-----------+------------------+
| 083a1d11-22f9-4e32-b476-21412b8bd725 | asd | MIGRATING | private=10.0.0.2 |
+--------------------------------------+------+-----------+------------------+

Tags: tempest
Mate Lakat (mate-lakat)
description: updated
Revision history for this message
Dan Prince (dan-prince) wrote :

Hi Mate,

I'm a little confused on this one. Would you mind adding the detailed steps one might need to follow to reproduce this issue to the ticket?

Changed in nova:
status: New → Incomplete
Mate Lakat (mate-lakat)
Changed in nova:
assignee: nobody → Mate Lakat (mate-lakat)
Revision history for this message
Mate Lakat (mate-lakat) wrote :

Dan: Sorry for the dodgy description, I updated the description with the appropriate steps. Please note, that I also have a pending change request for a tempest test, that covers this case at:

https://review.openstack.org/#/c/13101/

description: updated
Revision history for this message
Rohit Karajgi (rohitk) wrote :

I could reproduce this for KVM. Trying to migrate to the same host using CLI fails as expected, but the instance's task state moves to "migrating".
The task state should remain unchanged, or re-set to "None" for this use case.

Revision history for this message
Jaroslav Henner (jhenner) wrote :

`--> nova list --host node-01...
+--------------------------------------+----------+--------+---------------------+
| ID | Name | Status | Networks |
+--------------------------------------+----------+--------+---------------------+
| 3a6228e0-b87b-4c79-9521-5b1ad439f628 | testicek | ACTIVE | demonet=192.168.0.2 |
+--------------------------------------+----------+--------+---------------------+

`--> nova live-migration 3a6228e0-b87b-4c79-9521-5b1ad439f628 node-01...
ERROR: Live migration of instance 3a6228e0-b87b-4c79-9521-5b1ad439f628 to host node-01... failed (HTTP 400)

`--> nova list --host node-01...
+--------------------------------------+----------+-----------+---------------------+
| ID | Name | Status | Networks |
+--------------------------------------+----------+-----------+---------------------+
| 3a6228e0-b87b-4c79-9521-5b1ad439f628 | testicek | MIGRATING | demonet=192.168.0.2 |
+--------------------------------------+----------+-----------+---------------------+

Changed in nova:
status: Incomplete → Confirmed
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → Medium
importance: Medium → High
Jian Wen (wenjianhn)
Changed in nova:
assignee: Mate Lakat (mate-lakat) → Jian Wen (wenjianhn)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/19616

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/19616
Committed: http://github.com/openstack/nova/commit/be62d6a86971abac57a1cc03c985ba1e97fd55cb
Submitter: Jenkins
Branch: master

commit be62d6a86971abac57a1cc03c985ba1e97fd55cb
Author: Jian Wen <email address hidden>
Date: Mon Jan 14 19:13:24 2013 +0800

    Handle compute node not available for live migration

    This patch handles exception.ComputeServiceUnavailable by restoring
    instance's vm_state and instance's task_state after live migration
    failure caused by unavailable source/dest compute node.

    Raises detailed HTTPBadRequest explanation for this exception.

    Fixes bug 973393 and bug 1051881

    Change-Id: If825b61fad9c4e3030f2e6c5002907255eaf3661

Changed in nova:
status: In Progress → Fix Committed
tags: added: tempest
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-3
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/22873

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/22873
Committed: http://github.com/openstack/nova/commit/20294279ee1d6d82dbb87c4c29e3a8b9fd0cb8bd
Submitter: Jenkins
Branch: stable/folsom

commit 20294279ee1d6d82dbb87c4c29e3a8b9fd0cb8bd
Author: Jian Wen <email address hidden>
Date: Mon Jan 14 19:13:24 2013 +0800

    Handle compute node not available for live migration

    This patch handles exception.ComputeServiceUnavailable by restoring
    instance's vm_state and instance's task_state after live migration
    failure caused by unavailable source/dest compute node.

    Raises detailed HTTPBadRequest explanation for this exception.

    Fixes bug 973393 and bug 1051881

    Conflicts:
     nova/scheduler/driver.py
     nova/scheduler/manager.py
     nova/tests/api/openstack/compute/contrib/test_admin_actions.py
     nova/tests/scheduler/test_scheduler.py

    Change-Id: If825b61fad9c4e3030f2e6c5002907255eaf3661

Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.