Nova initiated Live Migration regression for vmware VCDriver

Bug #1192192 reported by Shawn Hartsock
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Wishlist
Unassigned

Bug Description

Nova's Live Migration feature should work with hosts in the same cluster. We can't specify hosts in a cluster to move between! That makes the live-migration feature effectively disabled.

> I've found here
> <https://wiki.openstack.org/wiki/HypervisorSupportMatrix>that ESX/VC
> drivers supports live-migration, also i found related method in
> code, it uses vmware API "MigrateVM_Task" function.
>
> But i couldn't understand how i should use live-migration:
>
> - standalone ESXi hosts not supports any migration. Therefore
> VMWareESXDriver also not supports migration. Correct, if am wrong.
> - In case vCenter (VMWareVCDriver) i could use vMotion to migrate VMs
> between members of cluster. But nova sees cluster as a single "host" and
> thru "nova live-migration VM" scheduler raise exception "NoValidHost: No
> valid host was found."
>
> My question is: What is the use-case of
> this<https://github.com/openstack/nova/blob/stable/grizzly/nova/virt/vmwareapi/vmops.py#L1018>function

See: http://docs.openstack.org/trunk/openstack-compute/admin/content/live-migration-usage.html

Note: one possible fix for this is to implement a migration strategy that can move between hosts not in the same cluster.

Tags: vmware
Revision history for this message
Shawn Hartsock (hartsock) wrote :

There is no work-around. Hosts are occluded by clusters.

Changed in nova:
importance: Undecided → Critical
milestone: none → havana-3
description: updated
Revision history for this message
dan wendlandt (danwent) wrote :

could you update the title of this to make it clear that you are talking about live-migrations initiated via nova? live migrations initiated via vCenter should still work, right?

Revision history for this message
Shawn Hartsock (hartsock) wrote :

Yes. This is only the case where nova attempts to specify a live-migration from the CLI. The automated live-migrations conducted by DRS enabled clusters work fine.

summary: - Live Migration regression for vmware VCDriver
+ Nova initiated Live Migration regression for vmware VCDriver
description: updated
Changed in nova:
status: New → Confirmed
Changed in nova:
importance: Critical → High
Revision history for this message
Shawn Hartsock (hartsock) wrote :

Currently seeing this error:

nova live-migration <instance Id> <host>

Here is the stack trace:
 ERROR nova.openstack.common.rpc.amqp [req-27abd343-13b0-4080-a468-836baed6100f admin demo] Exception during message handling
 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 426, in _process_data
 TRACE nova.openstack.common.rpc.amqp **args)
 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 99, in wrapped
 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
 TRACE nova.openstack.common.rpc.amqp self.gen.next()
 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/exception.py", line 76, in wrapped
 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/compute/manager.py", line 3442, in check_can_live_migrate_destination
 TRACE nova.openstack.common.rpc.amqp block_migration, disk_over_commit)
 TRACE nova.openstack.common.rpc.amqp File "/opt/stack/nova/nova/virt/driver.py", line 567, in check_can_live_migrate_destination
 TRACE nova.openstack.common.rpc.amqp raise NotImplementedError()
 TRACE nova.openstack.common.rpc.amqp NotImplementedError
 TRACE nova.openstack.common.rpc.amqp
 ERROR nova.openstack.common.rpc.common [req-27abd343-13b0-4080-a468-836baed6100f admin demo] Returning exception to caller
 ERROR nova.openstack.common.rpc.common [req-27abd343-13b0-4080-a468-836baed6100f admin demo] ['Traceback (most recent call last):\n', ' File "/opt/stack/nova/nova/openstack/common/rpc/amqp.py", line 426, in _process_data\n **args)\n', ' File "/opt/stack/nova/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/opt/stack/nova/nova/exception.py", line 99, in wrapped\n temp_level, payload)\n', ' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', ' File "/opt/stack/nova/nova/exception.py", line 76, in wrapped\n return f(self, context, *args, **kw)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 3442, in check_can_live_migrate_destination\n block_migration, disk_over_commit)\n', ' File "/opt/stack/nova/nova/virt/driver.py", line 567, in check_can_live_migrate_destination\n raise NotImplementedError()\n', 'NotImplementedError\n']

Changed in nova:
importance: High → Low
milestone: havana-3 → none
importance: Low → High
milestone: none → havana-3
Changed in nova:
importance: High → Medium
milestone: havana-3 → none
Changed in nova:
milestone: none → next
description: updated
Revision history for this message
John Garbutt (johngarbutt) wrote :

This is a blueprint not a bug really, I am guessing?

Tracy Jones (tjones-i)
Changed in nova:
assignee: Shawn Hartsock (hartsock) → nobody
Changed in nova:
assignee: nobody → Sabari Kumar Murugesan (smurugesan)
Changed in nova:
milestone: next → none
Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

Hi Sabari, may I know the progress of this bug?

Just want to contribute and show some of my ideas here. IMHO, we need to support both live migration with/without target host when using VCDriver.

1) For migration with target host, we may need to enhance nova api to support migrate vm to a specified node, so possibly the api live migration need to change the format of host to host:node.

2) For live migration without target host, nova scheduler will help select target host, but before select host, it will put the host where the VM running (for vmware, host is vcenter host) to attempted_hosts, this might also not accurate, we may need compare both host and node and decide if they are same.
https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L145

3) The last one is we need implement check_can_live_migrate_destination() for vmware driver.

I can help 1) and 2) if you agree with my proposal, please show your comments if any, hope this can be fixed soon as live migration is really an important feature for many customers. Thanks.

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

Another question is if want to live migration, then we may also need to enable VCDriver report all ESX servers. @Sabari, hope I can get some comments from you. Thanks.

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

@John Garbutt, I also prefer that this is a blueprint and deserve some discussions.

1) Live migration with target host
2) Live migration without target host
3) Live migration between clusters in same DC
4) Live migration in same DC without cluster
5) ....

Many case we need to consider....

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

Two typical use cases:

 DC1
    |
    |----Cluster1
    | |
    | |----9.111.249.56
    |
    |----Cluster2
               |
               |----9.111.249.49

Case 1)
One nova compute manage two clusters.

nova.conf:
cluster_name=Cluster2
cluster_name=Cluster1

live migration failed because target host and source host will be considered to the same host.

Case 2)
nova compute 1 manage Cluster1
nova.conf:
cluster_name=Cluster1

nova compute 2 manage Cluster2
nova.conf:
cluster_name=Cluster2

live migration failed because of
2014-02-12 11:47:38.557 32416 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/driver.py", line 598, in check_can_live_migrate_destination
2014-02-12 11:47:38.557 32416 TRACE nova.openstack.common.rpc.amqp raise NotImplementedError()
2014-02-12 11:47:38.557 32416 TRACE nova.openstack.common.rpc.amqp NotImplementedError

Revision history for this message
Shawn Hartsock (hartsock) wrote :

@Jay Lau

This is a regression, I don't see why it should have a blueprint. The success of live migration is highly dependent on the configuration of your set up.

http://vmwaremine.com/2012/10/21/vmotion-vm-between-vsphere-clusters/#sthash.TJgTTQmI.dpbs

With this in mind, it can be difficult to exercise the feature properly.

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

@Shawn, the link seems for how to live migration using vsphere client, but what about using OpenStack? How can I live migrate a VM with OpenStack VCDriver? Thanks.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Is anyone working on this? It's been sitting here for nearly a year and a half.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I guess this is still a thing since the hypervisor support matrix calls it out:

http://docs.openstack.org/developer/nova/support-matrix.html#operation_live_migrate

Changed in nova:
importance: Medium → Wishlist
assignee: Sabari Murugesan (smurugesan) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/228893
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c9d78d1306241dbcd14d903a97f900baa4a61efc
Submitter: Jenkins
Branch: master

commit c9d78d1306241dbcd14d903a97f900baa4a61efc
Author: Gary Kotton <email address hidden>
Date: Tue Sep 29 06:13:02 2015 -0700

    VMware: raise NotImplementedError for live migration methods

    The VMware VC driver currently does not support live migrations.
    This was supported by the ESX driver but that was removed in the
    Juno cycle.

    In the mean time the driver will raise a NotImplementedError for
    live migrations until we fix this.

    Related-Bug: #1192192

    Change-Id: I20cdbb5892d2e45b0b8f9a658793ab20dbf48fa1

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

The support matrix clarifies that this is not supported and the code raises an exception. The VMWare folks surely know this limitation of their driver. End users and developers are therefore informed. I see no need to keep this bug report open.

Changed in nova:
status: Confirmed → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.