nova failures when vCenter has multiple datacenters

Bug #1180044 reported by Shawn Hartsock on 2013-05-14
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Gary Kotton
Havana
High
Gary Kotton
VMwareAPI-Team
Critical
Gary Kotton
nova (Ubuntu)
Undecided
Unassigned

Bug Description

The method at vmops.py _get_datacenter_ref_and_name does not calculate datacenter properly.

    def _get_datacenter_ref_and_name(self):
        """Get the datacenter name and the reference."""
        dc_obj = self._session._call_method(vim_util, "get_objects",
                "Datacenter", ["name"])
        vm_util._cancel_retrieve_if_necessary(self._session, dc_obj)
        return dc_obj.objects[0].obj, dc_obj.objects[0].propSet[0].val

This will not be correct on systems with more than one datacenter.

Stack trace from logs:

ERROR nova.compute.manager [req-9395fe41-cf04-4434-bd77-663e93de1d4a foo bar] [instance: 484a42a2-642e-4594-93fe-4f72ddad361f] Error: ['Traceback (most recent call last):\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 942, in _build_instance\n set_access_ip=set_access_ip)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 1204, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', ' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 1200, in _spawn\n block_device_info)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 176, in spawn\n block_device_info)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 208, in spawn\n _execute_create_vm()\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 204, in _execute_create_vm\n self._session._wait_for_task(instance[\'uuid\'], vm_create_task)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 559, in _wait_for_task\n ret_val = done.wait()\n', ' File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait\n return hubs.get_hub().switch()\n', ' File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch\n return self.greenlet.switch()\n', 'NovaException: A specified parameter was not correct. \nspec.location.folder\n']

vCenter error is:
"A specified parameter was not correct. spec.location.folder"

Work around:
use only one datacenter, use only one cluster, turn on DRS

Additional failures:
2013-07-18 10:59:12.788 DEBUG nova.virt.vmwareapi.vmware_images [req-e8306ffe-c6c7-4d0f-a466-fb532375cbd3 7799f10ca7da47f3b2660feb363b370b 0e1771f8db984a3599596fae62609d9a] [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] Got image size of 687865856 for the image cde14862-60b8-4360-a145-06585b06577c get_vmdk_size_and_properties /usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/vmware_images.py:156
2013-07-18 10:59:12.963 WARNING nova.virt.vmwareapi.network_util [req-e8306ffe-c6c7-4d0f-a466-fb532375cbd3 7799f10ca7da47f3b2660feb363b370b 0e1771f8db984a3599596fae62609d9a] [(ManagedObjectReference){
   value = "network-1501"
   _type = "Network"
 }, (ManagedObjectReference){
   value = "network-1458"
   _type = "Network"
 }, (ManagedObjectReference){
   value = "network-2085"
   _type = "Network"
 }, (ManagedObjectReference){
   value = "network-1143"
   _type = "Network"
 }]
2013-07-18 10:59:13.326 DEBUG nova.virt.vmwareapi.vmops [req-e8306ffe-c6c7-4d0f-a466-fb532375cbd3 7799f10ca7da47f3b2660feb363b370b 0e1771f8db984a3599596fae62609d9a] [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] Creating VM on the ESX host _execute_create_vm /usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/vmops.py:207
2013-07-18 10:59:14.258 3145 DEBUG nova.openstack.common.rpc.amqp [-] Making synchronous call on conductor ... multicall /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:583
2013-07-18 10:59:14.259 3145 DEBUG nova.openstack.common.rpc.amqp [-] MSG_ID is 8ef36d061a9341a09d3a5451df798673 multicall /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:586
2013-07-18 10:59:14.259 3145 DEBUG nova.openstack.common.rpc.amqp [-] UNIQUE_ID is 680b790574c64a9783fd2138c43f5f6d. _add_unique_id /usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py:337
2013-07-18 10:59:18.757 3145 WARNING nova.virt.vmwareapi.driver [-] Task [CreateVM_Task] (returnval){
   value = "task-33558"
   _type = "Task"
 } status: error The input arguments had entities that did not belong to the same datacenter.

2013-07-18 10:59:18.758 ERROR nova.compute.manager [req-e8306ffe-c6c7-4d0f-a466-fb532375cbd3 7799f10ca7da47f3b2660feb363b370b 0e1771f8db984a3599596fae62609d9a] [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] Instance failed to spawn
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] Traceback (most recent call last):
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1103, in _spawn
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] block_device_info)
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/driver.py", line 177, in spawn
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] block_device_info)
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/vmops.py", line 217, in spawn
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] _execute_create_vm()
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/vmops.py", line 213, in _execute_create_vm
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] self._session._wait_for_task(instance['uuid'], vm_create_task)
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/driver.py", line 554, in _wait_for_task
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] ret_val = done.wait()
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] return hubs.get_hub().switch()
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] return self.greenlet.switch()
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] NovaException: The input arguments had entities that did not belong to the same datacenter.
2013-07-18 10:59:18.758 3145 TRACE nova.compute.manager [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539]

2013-07-18 10:59:20.029 ERROR nova.compute.manager [req-e8306ffe-c6c7-4d0f-a466-fb532375cbd3 7799f10ca7da47f3b2660feb363b370b 0e1771f8db984a3599596fae62609d9a] [instance: 5b3961b6-38d9-409c-881e-fe50f67b1539] Error: ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 848, in _run_instance\n set_access_ip=set_access_ip)\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1107, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', ' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1103, in _spawn\n block_device_info)\n', ' File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/driver.py", line 177, in spawn\n block_device_info)\n', ' File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/vmops.py", line 217, in spawn\n _execute_create_vm()\n', ' File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/vmops.py", line 213, in _execute_create_vm\n self._session._wait_for_task(instance[\'uuid\'], vm_create_task)\n', ' File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/driver.py", line 554, in _wait_for_task\n ret_val = done.wait()\n', ' File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait\n return hubs.get_hub().switch()\n', ' File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch\n return self.greenlet.switch()\n', 'NovaException: The input arguments had entities that did not belong to the same datacenter.\n']

2013-07-18 10:59:23.831 3145 WARNING nova.virt.vmwareapi.driver [-] Task [CreateVM_Task] (returnval){
   value = "task-33558"
   _type = "Task"
 } status: error The input arguments had entities that did not belong to the same datacenter.
2013-07-18 10:59:23.832 3145 WARNING nova.virt.vmwareapi.driver [-] In vmwareapi:_poll_task, Got this error Trying to re-send() an already-triggered event.
2013-07-18 10:59:23.833 3145 ERROR nova.utils [-] in fixed duration looping call
2013-07-18 10:59:23.833 3145 TRACE nova.utils Traceback (most recent call last):
2013-07-18 10:59:23.833 3145 TRACE nova.utils File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 594, in _inner
2013-07-18 10:59:23.833 3145 TRACE nova.utils self.f(*self.args, **self.kw)
2013-07-18 10:59:23.833 3145 TRACE nova.utils File "/usr/lib/python2.7/dist-packages/nova/virt/vmwareapi/driver.py", line 580, in _poll_task
2013-07-18 10:59:23.833 3145 TRACE nova.utils done.send_exception(excep)
2013-07-18 10:59:23.833 3145 TRACE nova.utils File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 208, in send_exception
2013-07-18 10:59:23.833 3145 TRACE nova.utils return self.send(None, args)
2013-07-18 10:59:23.833 3145 TRACE nova.utils File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 150, in send
2013-07-18 10:59:23.833 3145 TRACE nova.utils assert self._result is NOT_USED, 'Trying to re-send() an already-triggered event.'
2013-07-18 10:59:23.833 3145 TRACE nova.utils AssertionError: Trying to re-send() an already-triggered event.
2013-07-18 10:59:23.833 3145 TRACE nova.utils

summary: - nova boot fails when a VMware vCenter managed data center is empty
+ nova boot fails when any VMware vCenter managed datacenter object is
+ empty
description: updated
Changed in nova:
assignee: nobody → Shawn Hartsock (hartsock)
summary: - nova boot fails when any VMware vCenter managed datacenter object is
- empty
+ nova boot fails when any VMware vCenter managed datacenter or container
+ object is empty
Michael Still (mikal) on 2013-05-22
Changed in nova:
status: New → Confirmed
importance: Undecided → High

I tried to reproduce this by taking a working setup with a single datacenter and adding a second empty datacenter. Even when i restacked the environment, I did not see an error, so there must be slightly more to it than described.

description: updated
Changed in nova:
milestone: none → havana-2
Shawn Hartsock (hartsock) wrote :

This probably has something to do with how "spec.location.folder" is calculated. It probably doesn't manifest in all situations.

Changed in nova:
status: Confirmed → In Progress
summary: - nova boot fails when any VMware vCenter managed datacenter or container
- object is empty
+ nova boot fails with multiple vCenter managed datacenters

I can triage and work on this issue.

Changed in nova:
assignee: Shawn Hartsock (hartsock) → Sabari Kumar Murugesan (smurugesan)
Changed in nova:
importance: High → Critical
tags: added: grizzly-backport-potential
description: updated
Changed in nova:
assignee: Sabari Kumar Murugesan (smurugesan) → nobody
assignee: nobody → Shawn Hartsock (hartsock)
description: updated
Shawn Hartsock (hartsock) wrote :

Root cause:
https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vmops.py#L200

Does not specify HOST!
http://pubs.vmware.com/vsphere-51/index.jsp#com.vmware.wssdk.apiref.doc/vim.Folder.html#createVm

Reference:
"The target host on which the virtual machine will run. This must specify a host that is a member of the ComputeResource indirectly specified by the pool. For a stand-alone host or a cluster with DRS, host can be omitted, and the system selects a default."

description: updated
description: updated
description: updated
description: updated
description: updated
summary: - nova boot fails with multiple vCenter managed datacenters
+ nova boot fails when vCenter has multiple managed stand-alone hosts and
+ no clear default host

Lowering priority since there is a work-around.

summary: - nova boot fails when vCenter has multiple managed stand-alone hosts and
- no clear default host
+ nova boot fails when vCenter has multiple managed hosts and no clear
+ default host
Changed in nova:
importance: Critical → High
Changed in nova:
importance: High → Critical
Thierry Carrez (ttx) wrote :

Please keep "Critical" for serious regressions / data loss issues / issues affecting all Nova users.
See https://wiki.openstack.org/wiki/Bugs for guidelines.

Changed in nova:
importance: Critical → High
Changed in nova:
importance: High → Critical
importance: Critical → High
Shawn Hartsock (hartsock) wrote :

There is a work-around, so yes. This is no longer critical.

Changed in nova:
importance: High → Critical
importance: Critical → High
Changed in nova:
milestone: havana-2 → havana-3
Changed in nova:
importance: High → Low
status: In Progress → Triaged

See also: https://bugs.launchpad.net/nova/+bug/1208906

These are related problems due to how the driver handles inventory hierarchy.

summary: - nova boot fails when vCenter has multiple managed hosts and no clear
- default host
+ nova boot fails when vCenter has multiple datacenters, managed hosts, or
+ clusters and no clear default host
summary: - nova boot fails when vCenter has multiple datacenters, managed hosts, or
+ nova failures when vCenter multiple datacenters, managed hosts, or
clusters and no clear default host
Changed in nova:
importance: Low → High
status: Triaged → In Progress
summary: - nova failures when vCenter multiple datacenters, managed hosts, or
+ nova failures when vCenter has multiple datacenters, managed hosts, or
clusters and no clear default host
description: updated

The method at vmops.py _get_datacenter_ref_and_name does not calculate datacenter properly.

    def _get_datacenter_ref_and_name(self):
        """Get the datacenter name and the reference."""
        dc_obj = self._session._call_method(vim_util, "get_objects",
                "Datacenter", ["name"])
        vm_util._cancel_retrieve_if_necessary(self._session, dc_obj)
        return dc_obj.objects[0].obj, dc_obj.objects[0].propSet[0].val

This will not be correct on systems with more than one datacenter.

summary: - nova failures when vCenter has multiple datacenters, managed hosts, or
- clusters and no clear default host
+ nova failures when vCenter has multiple datacenters
description: updated
Shawn Hartsock (hartsock) wrote :

Gary, please take these two patches and run them down.

Changed in nova:
assignee: Shawn Hartsock (hartsock) → Gary Kotton (garyk)
Changed in nova:
assignee: Gary Kotton (garyk) → nobody
assignee: nobody → Vui Lam (vui)
Thierry Carrez (ttx) on 2013-09-05
Changed in nova:
milestone: havana-3 → havana-rc1
Changed in nova:
importance: High → Medium
Russell Bryant (russellb) wrote :

Since the current patch is a WIP, I'm going to put this on the potential list for the havana RC

tags: added: havana-rc-potential
Changed in nova:
milestone: havana-rc1 → none
Changed in nova:
assignee: Vui Lam (vui) → Shawn Hartsock (hartsock)
Thierry Carrez (ttx) on 2013-10-14
tags: added: havana-backport-potential
removed: havana-rc-potential
Gary Kotton (garyk) on 2013-10-17
Changed in nova:
importance: Medium → High
Gary Kotton (garyk) wrote :

https://review.openstack.org/#/c/52630/ (No idea why Jenkins does not update the bug automatically)

Changed in nova:
assignee: Shawn Hartsock (hartsock) → Gary Kotton (garyk)
milestone: none → icehouse-1
Shawn Hartsock (hartsock) wrote :

cut-and-paste of comments posted on review. I want to be sure to preserve the knowledge I gained from working this bug for several weeks. By all means, Gary please finish the work. Thank you.

Conceptually, all VMs in a vCenter are placed under a Datacenter and each datacenter has a vmfolder that holds all instances. The idea of a datacenter (at least conceptually) is that this represents a segregated hunk of hardware so when you ask about all virtual machines in a datacenter in Cleveland, it makes no sense to talk about the ones in Munich (as an exaggerated example nobody really does that).

So selecting the correct datacenter paths is pretty important. vmfolder happens to be an implicit and hidden path for VM placement specification even so, Nova must tell vCenter "the scheduler said place it here" where here is the vmfolder, datastore, etc. Since the current scheduler is only giving the datastore coordinate we can extrapolate which datacenter the scheduler meant by calculating the nearest in tree vmfolder path.
Currently the driver completely ignores all this and we tell people to just use one datacenter so that the calculation can't possibly get messed up (but more than one and the path calculation can be wrong) so this is a pretty major driver change we need to get right.

We need to be sure that we encode the tree traversals correctly so that the scheduler's intentions are properly translated into vCenter commands. That's not happening right now.
I will pull this for manual testing later this week but I can tell by looking you'll still fail when there are more than one datacenter and the intended datacenter is not *first* or position 0.

Changed in nova:
milestone: icehouse-1 → icehouse-2
Gary Kotton (garyk) on 2013-12-05
Changed in openstack-vmwareapi-team:
importance: Undecided → Critical
Gary Kotton (garyk) on 2013-12-08
Changed in openstack-vmwareapi-team:
assignee: nobody → Gary Kotton (garyk)

Reviewed: https://review.openstack.org/52630
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a25b2ac5f440f7ace4678b21ada6ebf5ce5dff3c
Submitter: Jenkins
Branch: master

commit a25b2ac5f440f7ace4678b21ada6ebf5ce5dff3c
Author: Gary Kotton <email address hidden>
Date: Fri Oct 18 06:12:40 2013 -0700

    VMware: fix bug when more than one datacenter exists

    In the case that there was more than one datacenter defined on the VC,
    then spawning an instance would result in an exception. The reason for this
    was that the nova compute would not set the correct datacenter for the
    selected datastore.

    The fix also takes care of the correct folder selection. This too was a
    result of not selecting the correct folder for the data center.

    The 'fake' configuration was updated to contain an additional data
    center with its on datastore.

    Closes-Bug: #1180044
    Closes-Bug: #1214850

    Co-authored-by: Shawn Harsock <email address hidden>

    Change-Id: Ib61811fffcbc80385efc3166c9e366fdaa6432bd

Changed in nova:
status: In Progress → Fix Committed
Tracy Jones (tjones-i) on 2014-01-10
Changed in openstack-vmwareapi-team:
status: New → Fix Committed
Thierry Carrez (ttx) on 2014-01-22
Changed in nova:
status: Fix Committed → Fix Released
Alan Pevec (apevec) on 2014-01-23
tags: removed: havana-backport-potential
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu):
status: New → Confirmed
Alan Pevec (apevec) on 2014-03-30
tags: removed: grizzly-backport-potential
Thierry Carrez (ttx) on 2014-04-17
Changed in nova:
milestone: icehouse-2 → 2014.1
James Page (james-page) wrote :

Icehouse was released with Ubuntu 14.04 - marking fix-released.

Changed in nova (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers