OpenStack Compute (nova)

nova boot fails to attach vmdk in multi-host environments without DRS properly enabled

Bug #1180897 reported by Shawn Hartsock on 2013-05-16

This bug report is a duplicate of: Bug #1180044: nova failures when vCenter has multiple datacenters. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	In Progress	Medium	Shawn Hartsock

Bug Description

When a VMware vCenter manages more than one ESXi Hosts the nova boot command will fail at the point during which the image (VMDK) is being attached to the VM. The error will be that the "file is not found" inspection of the datastore (you have to suspend or halt the nova process before it performs cleanup activities to observe this) will show that the VMDK was properly placed in the shared datastore but that the host may not be able to see the path at which the VMDK was stored. If you move the VMDK and attach it (using vSphere's own management tools) the VM will recover.

* If you have only one host in the cluster this problem goes away.
* If you only have one host in the vCenter this problem goes away.
* If you have DRS with automatic placement turned on the problem goes away.

2013-05-16 09:22:29.473 ERROR nova.compute.manager [req-9e61185b-5444-4f97-b711-6c65b716a2a0 demo demo] [instance: aef16488-88c7-4952-99d4-f55377c410e9] Error: ['Traceback (most recent call last):\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 941, in _build_instance\n set_access_ip=set_access_ip)\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 1203, in _spawn\n LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', ' File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__\n self.gen.next()\n', ' File "/opt/stack/nova/nova/compute/manager.py", line 1199, in _spawn\n block_device_info)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 176, in spawn\n block_device_info)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/vmops.py", line 405, in spawn\n vmdk_file_size_in_kb, linked_clone)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 68, in attach_disk_to_vm\n self._session._wait_for_task(instance_name, reconfig_task)\n', ' File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 559, in _wait_for_task\n ret_val = done.wait()\n', ' File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait\n return hubs.get_hub().switch()\n', ' File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch\n return self.greenlet.switch()\n', "NovaException: Invalid configuration for device '1'.\n"]
2013-05-16 09:22:29.474 DEBUG nova.openstack.common.rpc.amqp [req-9e61185b-5444-4f97-b711-6c65b716a2a0 demo demo] Making synchronous call on conductor ... from (pid=32776) multicall /opt/stack/nova/nova/openstack/common/rpc/amqp.py:586
2013-05-16 09:22:29.474 DEBUG nova.openstack.common.rpc.amqp [req-9e61185b-5444-4f97-b711-6c65b716a2a0 demo demo] MSG_ID is ed357119f3cf4d5c82f1679808a5f185 from (pid=32776) multicall /opt/stack/nova/nova/openstack/common/rpc/amqp.py:589
2013-05-16 09:22:29.474 DEBUG nova.openstack.common.rpc.amqp [req-9e61185b-5444-4f97-b711-6c65b716a2a0 demo demo] UNIQUE_ID is e63169e368eb4b9cb5121c801e640d9d. from (pid=32776) _add_unique_id /opt/stack/nova/nova/openstack/common/rpc/amqp.py:337
2013-05-16 09:22:29.475 DEBUG amqp [-] Closed channel #1 from (pid=32776) _do_close /usr/local/lib/python2.7/dist-packages/amqp/channel.py:88
2013-05-16 09:22:29.475 DEBUG amqp [-] using channel_id: 1 from (pid=32776) __init__ /usr/local/lib/python2.7/dist-packages/amqp/channel.py:70
2013-05-16 09:22:29.476 DEBUG amqp [-] Channel open from (pid=32776) _open_ok /usr/local/lib/python2.7/dist-packages/amqp/channel.py:420
2013-05-16 09:22:29.477 DEBUG nova.openstack.common.periodic_task [-] Running periodic task ComputeManager._poll_rebooting_instances from (pid=32776) run_periodic_tasks /opt/stack/nova/nova/openstack/common/periodic_task.py:175
2013-05-16 09:22:29.478 DEBUG nova.openstack.common.periodic_task [-] Running periodic task ComputeManager._reclaim_queued_deletes from (pid=32776) run_periodic_tasks /opt/stack/nova/nova/openstack/common/periodic_task.py:175
2013-05-16 09:22:29.479 DEBUG nova.compute.manager [-] CONF.reclaim_instance_interval <= 0, skipping... from (pid=32776) _reclaim_queued_deletes /opt/stack/nova/nova/compute/manager.py:3980
2013-05-16 09:22:29.480 DEBUG nova.openstack.common.periodic_task [-] Running periodic task ComputeManager._report_driver_status from (pid=32776) run_periodic_tasks /opt/stack/nova/nova/openstack/common/periodic_task.py:175
2013-05-16 09:22:29.481 INFO nova.compute.manager [-] Updating host status
2013-05-16 09:22:29.496 DEBUG amqp [-] Closed channel #1 from (pid=32776) _do_close /usr/local/lib/python2.7/dist-packages/amqp/channel.py:88
2013-05-16 09:22:29.497 DEBUG amqp [-] using channel_id: 1 from (pid=32776) __init__ /usr/local/lib/python2.7/dist-packages/amqp/channel.py:70
2013-05-16 09:22:29.498 DEBUG amqp [-] Channel open from (pid=32776) _open_ok /usr/local/lib/python2.7/dist-packages/amqp/channel.py:420
2013-05-16 09:22:29.606 WARNING nova.virt.vmwareapi.driver [-] Task [DeleteDatastoreFile_Task] (returnval){
value = "task-273"
_type = "Task"
} status: error File [datastore01] instance-00000009 was not found
2013-05-16 09:22:29.607 WARNING nova.virt.vmwareapi.driver [-] In vmwareapi:_poll_task, Got this error Trying to re-send() an already-triggered event.
2013-05-16 09:22:29.607 ERROR nova.openstack.common.loopingcall [-] in fixed duration looping call
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall Traceback (most recent call last):
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall File "/opt/stack/nova/nova/openstack/common/loopingcall.py", line 78, in _inner
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall self.f(*self.args, **self.kw)
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 585, in _poll_task
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall done.send_exception(excep)
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 208, in send_exception
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall return self.send(None, args)
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 150, in send
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall assert self._result is NOT_USED, 'Trying to re-send() an already-triggered event.'
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall AssertionError: Trying to re-send() an already-triggered event.
2013-05-16 09:22:29.607 TRACE nova.openstack.common.loopingcall
2013-05-16 09:22:30.867 DEBUG nova.openstack.common.rpc.amqp [-] Making synchronous call on conductor ... from (pid=32776) multicall /opt/stack/nova/nova/openstack/common/rpc/amqp.py:586

See original description

Tags:

Shawn Hartsock (hartsock) on 2013-05-16

Changed in nova:
assignee:	nobody → Shawn Hartsock (hartsock)

Michael Still (mikal) on 2013-05-22

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Critical

Revision history for this message

Shawn Hartsock (hartsock) wrote on 2013-05-22:

Note: I am actively working on this.

Shawn Hartsock (hartsock) on 2013-05-22

Changed in nova:
status:	Confirmed → In Progress

Shawn Hartsock (hartsock) on 2013-05-24

summary:	- nova compute fails when vmware cluster has more than one ESXi Host + nova compute fails when vmware cluster has more than one ESXi Host and + NO shared datastores
summary:	- nova compute fails when vmware cluster has more than one ESXi Host and - NO shared datastores + nova compute fails when vmware cluster has NO shared datastores

Shawn Hartsock (hartsock) on 2013-05-28

Changed in nova:
assignee:	Shawn Hartsock (hartsock) → nobody

Shawn Hartsock (hartsock) on 2013-05-29

Changed in nova:
assignee:	nobody → Shawn Hartsock (hartsock)

Shawn Hartsock (hartsock) on 2013-06-03

Changed in nova:
milestone:	none → havana-2
summary:	- nova compute fails when vmware cluster has NO shared datastores + nova boot fails to attach vmdk in multi-host-cluster

Revision history for this message

Shawn Hartsock (hartsock) wrote on 2013-06-03: Re: nova boot fails to attach vmdk in multi-host-cluster

This only occurs with local storage.

Changed in nova:
importance:	Critical → High
importance:	High → Medium

Shawn Hartsock (hartsock) on 2013-06-11

summary:	- nova boot fails to attach vmdk in multi-host-cluster + nova boot fails to attach vmdk in multi-host-cluster without DRS
Changed in nova:
importance:	Medium → High

Revision history for this message

Shawn Hartsock (hartsock) wrote on 2013-06-14: Re: nova boot fails to attach vmdk in multi-host-cluster without DRS

I am currently working on a traversal spec for this problem. My current solution has introduced a new bug, so I'm troubleshooting that before I post a fix.

Shawn Hartsock (hartsock) on 2013-06-21

summary:	- nova boot fails to attach vmdk in multi-host-cluster without DRS + nova boot fails to attach vmdk in multi-host environments
description:	updated

Revision history for this message

Shawn Hartsock (hartsock) wrote on 2013-06-21: Re: nova boot fails to attach vmdk in multi-host environments

We've narrowed this problem down to situations where the vCenter inventory is not using *only* Clusters with DRS and automatic placement turned on. So, after working this bug for a while, it does not seem as critical as it did.

Revision history for this message

dan wendlandt (danwent) wrote on 2013-06-24:

can you update the title to indicate that this is less severe?

Revision history for this message

Shawn Hartsock (hartsock) wrote on 2013-07-10:

Follow up: Does this issue occur when there are multiple datacenters?

summary:	- nova boot fails to attach vmdk in multi-host environments + nova boot fails to attach vmdk in multi-host environments without DRS + properly enabled
Changed in nova:
milestone:	havana-2 → none
importance:	High → Medium

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1180044 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.