VM would go to ERROR when live migration if libvirt on target host is down

Bug #1233184 reported by Guangya Liu (Jay Lau)
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Vladik Romanovsky

Bug Description

1) Stop libvirtd on target host
2) live migrate the VM
3) The VM would go to ERROR state
4) Check VM status with "virsh list", the VM is strill running well on source host.

Tags: compute
summary: - VM would go to error when live migration if libvirt on target host is
+ VM would go to ERROR when live migration if libvirt on target host is
down
tags: added: compute
melanie witt (melwitt)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Changed in nova:
assignee: nobody → Vladik Romanovsky (vladik-romanovsky)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Yassine (yassine-lamgarchal) wrote :

Hi Jay,

since it's a problem related to the live migration process i think the VM should not go to ERROR state, although an exception should be raised. It would be nice to set the compute node as "unaivalable" in case libvirtd is not running in order to prevent this case.

We could add a function is_compute_healthy() in nova.virt.driver which performs some healthchecks of the compute node, it could be called periodically. Every drivers must implement it.

What do you think about that ?

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

Thanks Yassine!

I also thought this solution, but I think there will still be small time windows for the VM deploy to the hypervisor whose libvirtd has some problem since the healthchecks is **periodic task**

It seems that we can add more checking in _check_requested_destination() to see if the target host is health or not before live migration. Comments? ;-)

Revision history for this message
Vladik Romanovsky (vladik-romanovsky) wrote :

Hi everyone,

I have proposed a patch for this a while ago: https://review.openstack.org/#/c/50629
However, it's still WIP.

I think the best way is that libvirt driver will respond with a meaningful error, when the connection is broken.
This way we will be able to catch in the conductor task.

Moreover, I think we should relay on libvirt events (on the source host), as its the only component that knows the status of the vm.

The more general solution, in my opinion would be to disable the nova compute service, when libvirt is not available and resume it when its available.
I have started a discussion on this topic on the mailing list. Maybe you would like to share your thoughts there as well (Subject : "Disabling nova-compute when a connection to libvirt is broken.")

What do you think?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/50629
Committed: http://github.com/openstack/nova/commit/6df1b872cbfd8646614a52979904ca50a76c59c0
Submitter: Jenkins
Branch: master

commit 6df1b872cbfd8646614a52979904ca50a76c59c0
Author: Vladik Romanovsky <email address hidden>
Date: Wed Oct 16 15:07:23 2013 -0400

    There is no need to set VM status to ERROR on a failed migration

    VM state should not be set to ERROR, when the
    migration failed beauce libvirt is unavailble on the destination.

    Fixes bug #1233184
    Change-Id: I36e4ed3842d7e33c1082ec95f860629eee23224e

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-2
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.