resize to different host failed by vcenter driver

Bug #1259389 reported by dingxy
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

1. I have a environment managed two cluster. Make sure there is two compute-service for vmware exists, in this case, the vmware compute service is: vmware_nova_cluster01 and vmware_nova_cluster02
[root@10-1-0-71 nova]# nova-manage service list
Binary Host Zone Status State Updated_At
nova-cells 10-1-0-71 internal enabled :-) 2013-12-06 06:22:48
nova-conductor 10-1-0-71 internal enabled :-) 2013-12-06 06:22:50
nova-scheduler 10-1-0-71 internal enabled :-) 2013-12-06 06:22:42
nova-cert 10-1-0-71 internal enabled :-) 2013-12-06 06:22:43
nova-console 10-1-0-71 internal enabled :-) 2013-12-06 06:22:51
nova-consoleauth 10-1-0-71 internal enabled :-) 2013-12-06 06:22:42
nova-compute 10-1-0-71 nova enabled :-) 2013-12-06 06:22:50
nova-compute vmware_nova nova enabled XXX 2013-11-21 10:03:44
nova-compute vmware_nova_cluster01 nova enabled :-) 2013-12-06 06:22:43
nova-compute vmware_nova_cluster02 nova enabled :-) 2013-12-06 06:22:45

2. deploy a vm to cluster02:
nova boot --image 4432ca56-8be0-4794-9c22-52a8c7a2bf55 --flavor 1 --availability-zone nova:vmware_nova_cluster02 test
[root@10-1-0-71 nova]# nova list
+--------------------------------------+------+--------+------------+-------------+-------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+-------------------+
| 127db0b8-97e6-4e66-8550-50b219187134 | test | BUILD | spawning | NOSTATE | network1=10.0.1.2 |
+--------------------------------------+------+--------+------------+-------------+-------------------+

3. resize the instance to flavor 2
POST: http://10.1.0.71:8774/v2/045dd87f67eb40d1b18f6c9498be3bd9/servers/127db0b8-97e6-4e66-8550-50b219187134/action
{
    "resize" : {
    "flavorRef" :"2"
    }
}

4. wait a while, the instance became to verify_resize status, then confirm resize
[root@10-1-0-71 nova]# nova list
+--------------------------------------+------+---------------+------------+-------------+-------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+---------------+------------+-------------+-------------------+
| 127db0b8-97e6-4e66-8550-50b219187134 | test | VERIFY_RESIZE | None | Running | network1=10.0.1.2 |
+--------------------------------------+------+---------------+------------+-------------+-------------------+
POST: http://10.1.0.71:8774/v2/045dd87f67eb40d1b18f6c9498be3bd9/servers/127db0b8-97e6-4e66-8550-50b219187134/action
{
    "confirmResize" : null
}

3. Instance become to ERROR
[root@10-1-0-71 nova]# nova list
+--------------------------------------+------+--------+------------+-------------+-------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+-------------------+
| 127db0b8-97e6-4e66-8550-50b219187134 | test | ERROR | None | Running | network1=10.0.1.2 |
+--------------------------------------+------+--------+------------+-------------+-------------------+

4. the error show
| fault | {u'message': u'NV-3AB798A The resource domain-c17(cluster01) does not exist', u'code': 404, u'created': u'2013-12-06T06:17:58Z'} |

5. messages in log:
2013-12-06 00:17:58.892 10072 TRACE nova.compute.manager [instance: 127db0b8-97e6-4e66-8550-50b219187134]
2013-12-06 00:17:59.035 10072 ERROR nova.openstack.common.rpc.amqp [req-4d2d3892-0b41-494b-b3c8-f94eca801db8 01dd320eb49d4bdfaa08a9ec021a48d4 045dd87f67eb40d1b18f6c9498be3bd9] Exception during message handling
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 461, in _process_data
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp **args)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 90, in wrapped
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp payload)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 73, in wrapped
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 306, in decorated_function
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 283, in decorated_function
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp e, sys.exc_info())
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 270, in decorated_function
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2775, in confirm_resize
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp do_confirm_resize(context, instance, migration.id)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 248, in inner
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp return f(*args, **kwargs)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2773, in do_confirm_resize
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp migration=migration)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2799, in _confirm_resize
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp network_info)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/vmwareapi/driver.py", line 446, in confirm_migration
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp _vmops = self._get_vmops_for_compute_node(instance['node'])
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/vmwareapi/driver.py", line 548, in _get_vmops_for_compute_node
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp resource = self._get_resource_for_node(nodename)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/virt/vmwareapi/driver.py", line 540, in _get_resource_for_node
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp raise exception.NotFound(msg)
2013-12-06 00:17:59.035 10072 TRACE nova.openstack.common.rpc.amqp NotFound: NV-3AB798A The resource domain-c17(cluster01) does not exist

6. [root@10-1-0-71 nova]# nova host-list
+-----------------------+-------------+----------+
| host_name | service | zone |
+-----------------------+-------------+----------+
| 10-1-0-71 | cells | internal |
| 10-1-0-71 | conductor | internal |
| 10-1-0-71 | scheduler | internal |
| 10-1-0-71 | cert | internal |
| 10-1-0-71 | console | internal |
| 10-1-0-71 | consoleauth | internal |
| 10-1-0-71 | compute | nova |
| vmware_nova | compute | nova |
| vmware_nova_cluster01 | compute | nova |
| vmware_nova_cluster02 | compute | nova |
+-----------------------+-------------+----------+
[root@10-1-0-71 nova]# nova hypervisor-list
+----+------------------------+
| ID | Hypervisor hostname |
+----+------------------------+
| 1 | 10-1-0-71 |
| 7 | domain-c17(cluster01) |
| 8 | domain-c382(cluster02) |
+----+------------------------+

7. I tried to change nova.conf file to make openstack display same host name when use host-list and hypervisor-list, like: domain-c17(cluster01) , however, since there is "(" and ")" in the string, seems openstack could not recognize "\(" or "\)", so this method failed.

Tags: vmware
Revision history for this message
Maithem (maithem) wrote :

In step 3, are you resizing the VM while it is spawning, or do you wait till the the operation is complete?

Revision history for this message
dingxy (xyding) wrote :

yes, i wait till the operation completed, I confirm resize till the instance become to verify_resize status.

Revision history for this message
zhu zhu (zhuzhubj) wrote :
Download full text (3.1 KiB)

This is interesting. My understanding is resize to different host. The host here means ESXi Host. I am not certain if current openstack support resize across Clusters.

Taking look at the code. "host_ref = self._get_host_ref_from_name(dest) ", And I think cluster does not fit into this category.

def migrate_disk_and_power_off(self, context, instance, dest,
                                   flavor):
        """
        Transfers the disk of a running instance in multiple phases, turning
        off the instance before the end.
        """
        # 0. Zero out the progress to begin
        self._update_instance_progress(context, instance,
                                       step=0,
                                       total_steps=RESIZE_TOTAL_STEPS)

        vm_ref = vm_util.get_vm_ref(self._session, instance)
        # Read the host_ref for the destination. If this is None then the
        # VC will decide on placement
        host_ref = self._get_host_ref_from_name(dest)

        # 1. Power off the instance
        self.power_off(instance)
        self._update_instance_progress(context, instance,
                                       step=1,
                                       total_steps=RESIZE_TOTAL_STEPS)

        # 2. Rename the original VM with suffix '-orig'
        name_label = self._get_orig_vm_name_label(instance)
        LOG.debug(_("Renaming the VM to %s") % name_label,
                  instance=instance)
        rename_task = self._session._call_method(
                            self._session._get_vim(),
                            "Rename_Task", vm_ref, newName=name_label)
        self._session._wait_for_task(instance['uuid'], rename_task)
        LOG.debug(_("Renamed the VM to %s") % name_label,
                  instance=instance)
        self._update_instance_progress(context, instance,
                                       step=2,
                                       total_steps=RESIZE_TOTAL_STEPS)

        # Get the clone vm spec
        ds_ref = vm_util.get_datastore_ref_and_name(
                            self._session, self._cluster, host_ref,
                            datastore_regex=self._datastore_regex)[0]
        client_factory = self._session._get_vim().client.factory
        rel_spec = vm_util.relocate_vm_spec(client_factory, ds_ref, host_ref)
        clone_spec = vm_util.clone_vm_spec(client_factory, rel_spec)
        vm_folder_ref = self._get_vmfolder_ref()

        # 3. Clone VM on ESX host
        LOG.debug(_("Cloning VM to host %s") % dest, instance=instance)
        vm_clone_task = self._session._call_method(
                                self._session._get_vim(),
                                "CloneVM_Task", vm_ref,
                                folder=vm_folder_ref,
                                name=instance['uuid'],
                                spec=clone_spec)
        self._session._wait_for_task(instance['uuid'], vm_clone_task)
        LOG.debug(_("Cloned VM to host %s") % dest, instance=instance)
        self._update_instance_progress(context, instance,
                                       step=3,
                                       total_steps=RESIZE_TOTAL_STEPS...

Read more...

Gary Kotton (garyk)
tags: added: vmware
Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :

In https://wiki.openstack.org/wiki/HypervisorSupportMatrix, it says resize for ESXi/VC indicates more testing is needed to make a judgment to support or not.

There is similar bug fixed before https://bugs.launchpad.net/nova/+bug/1199954.

Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :

It shows return None from https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/driver.py#L538.
There is race condition for dict self._resources :
Method https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/driver.py#L483 will udpate self._resources.
And this method is called by https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4973 periodically.

Need more deep investigate .....

Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :
Download full text (9.5 KiB)

I did more debug:
1) the nova schduler did choose another compute host as destination , and instance's host was changed into the destination when the instance is in VERIFY_RESIZE state. But from vCenter the instance was still in original EXSi host (actuall there are two instances ,one with old name , one with suffix _orig). This is not right.

2) after resize , instance's host name was changed, the changed host name is not mananged by nova-compute , then resize confrim, nova-compute can't find the changed host name , then the error was raised.

Before resize
[root@10-1-0-71 nova]# nova show gcb_resize1
+--------------------------------------+----------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------+
| status | ACTIVE |
| updated | 2013-12-18T07:42:27Z |
| OS-EXT-STS:task_state | None |
| OS-EXT-SRV-ATTR:host | vmware_nova_cluster01 |
| key_name | None |
| image | trend-thin (4432ca56-8be0-4794-9c22-52a8c7a2bf55) |
| network1 network | 10.0.1.2 |
| hostId | 964f8e94c6b9084b0059044896687b96e12dd97a33d61deb564244ee |
| OS-EXT-STS:vm_state | active |
| OS-EXT-SRV-ATTR:instance_name | instance-000000ca |
| OS-SRV-USG:launched_at | 2013-12-18T07:42:27.000000 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | domain-c17(cluster01) |
| flavor | m1.tiny (1) |
| id | 2f93a76e-fcd7-4fe2-96c6-1a750b430f3c |
| security_groups | [{u'name': u'default'}] |
| OS-SRV-USG:terminated_at | None |
| user_id | 01dd320eb49d4bdfaa08a9ec021a48d4 |
| name | gcb_resize1 |
| created | 2013-12-18T07:41:38Z |
| tenant_id | 045dd87f67eb40d1b18f6c9498be3bd9 |
| OS-DCF:diskConfig | MANUAL |
| metadata | {} |
| os-extended-volumes:volumes_attached | [] ...

Read more...

Tracy Jones (tjones-i)
no longer affects: openstack-vmwareapi-team
Tracy Jones (tjones-i)
Changed in nova:
status: New → Incomplete
status: Incomplete → Confirmed
milestone: none → next
Tracy Jones (tjones-i)
Changed in nova:
importance: Undecided → Medium
Changed in nova:
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.