VMware: spawning large amounts of VMs concurrently sometimes causes "VMDK lock" error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Gary Kotton | ||
VMwareAPI-Team |
Confirmed
|
High
|
Gary Kotton |
Bug Description
When using the VMwareVCDriver, spawning large amounts of virtual machines concurrently causes some instances to spawn with status ERROR. The number of machines that fail to build is unpredictable and sometimes all instances do end up spawning successfully.
The issue can be reproduced by running:
nova boot --image debian-2.6.32-i686 --flavor 1 --num-instances 32 nameless
The number of instances that causes the errors differ from environment to environment. Start with 30-40. There are two errors seen in the logs that are causing the instance spawn failures. The first is the ESX host not finding the image in the nfs datastore (even though it is there, otherwise other instances couldn't be spawned). The second is the ESX host not being able to access the vmdk image because it is locked.
Image not found error:
Traceback (most recent call last):
File "/opt/stack/
block_
File "/opt/stack/
admin_password, network_info, block_device_info)
File "/opt/stack/
vmdk_
File "/opt/stack/
self.
File "/opt/stack/
ret_val = done.wait()
File "/usr/local/
return hubs.get_
File "/usr/local/
return self.greenlet.
NovaException: File [ryan-nfs] vmware_
Image locked error:
Traceback (most recent call last):
File "/opt/stack/
block_
File "/opt/stack/
admin_password, network_info, block_device_info)
File "/opt/stack/
root_gb_in_kb, linked_clone)
File "/opt/stack/
self.
File "/opt/stack/
ret_val = done.wait()
File "/usr/local/
return hubs.get_
File "/usr/local/
return self.greenlet.
NovaException: Unable to access file [ryan-nfs] vmware_
Environment information:
- 1 datacenter, 1 cluster, 7 hosts
- NFS shared datastore
- was able to spawn 7 instances before errors appeared
- screen log with tracebacks: http://
description: | updated |
description: | updated |
summary: |
- VMware: errors spawning large amounts of VMs + VMware: spawning large amounts of VMs sometimes causes errors |
description: | updated |
Changed in nova: | |
status: | New → Confirmed |
assignee: | nobody → Vui Lam (vui) |
Changed in nova: | |
importance: | Undecided → High |
tags: | added: havana-backport-potential |
Changed in openstack-vmwareapi-team: | |
status: | New → Confirmed |
importance: | Undecided → High |
assignee: | nobody → Vui Lam (vui) |
summary: |
- VMware: spawning large amounts of VMs sometimes causes errors + VMware: spawning large amounts of VMs concurrently sometimes causes + errors |
Changed in nova: | |
milestone: | icehouse-1 → icehouse-2 |
Changed in nova: | |
status: | Confirmed → In Progress |
Changed in nova: | |
milestone: | icehouse-2 → icehouse-3 |
Changed in nova: | |
milestone: | icehouse-3 → icehouse-rc1 |
Changed in nova: | |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | icehouse-rc1 → 2014.1 |
Have you changed the config options of task_poll_interval? the default is 5 seconds, in NFS case, it might be increased to a more appropriate value.