VMware: timeouts due to nova-compute stuck at 100% when using deploying 100 VMs

Bug #1258179 reported by Gary Kotton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Gary Kotton
VMwareAPI-Team
In Progress
High
Unassigned

Bug Description

When there are 100's of VM deployed there are problems with nova compute. This is due to the fact that each interaction with the VM;s via get_vm_ref reads all of the VM's ont he system and then filters by the UUID. The filtering is done on the client side.

There are specific API's that optimize this search - http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.wssdk.apiref.doc%2Fvim.SearchIndex.html more specifically FindAllByUuid

Gary Kotton (garyk)
tags: added: grizzly-backport-potential havana-backport-potential vmware
Changed in nova:
assignee: nobody → Gary Kotton (garyk)
milestone: none → icehouse-2
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/60259

Changed in nova:
status: New → In Progress
Gary Kotton (garyk)
Changed in openstack-vmwareapi-team:
importance: Undecided → Critical
dan wendlandt (danwent)
Changed in openstack-vmwareapi-team:
importance: Critical → High
Tracy Jones (tjones-i)
Changed in openstack-vmwareapi-team:
status: New → In Progress
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → icehouse-3
Revision history for this message
Shawn Hartsock (hartsock) wrote :

While I agree this is a major scalability problem I would like to see it solved by resolving the underlying issue. The current version of the driver iterates over a list of every virtual machine in the entire vCenter inventory. This list is fetched on every VM operation.

In database terms this is the equivalent of doing "select * from VirtualMachine" and iterating over the whole result set until you find the one virtual machine you need. This is tremendously wasteful even if you only do it once per virtual machine in your inventory.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/60259
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=933603ed8523493d0693f02f62fef6d427de421f
Submitter: Jenkins
Branch: master

commit 933603ed8523493d0693f02f62fef6d427de421f
Author: Gary Kotton <email address hidden>
Date: Thu Dec 5 06:50:06 2013 -0800

    VMware: optimize instance reference access

    Fix bug causing nova-compute CPU to spike to 100%.

    When there are hundreds of VMs running each time a VM is referenced
    all of the VMs in the system will be read by nova-compute and then
    filtered according to the UUID.

    This is addressed by using an API (FindAllByUuid) which reads only
    the specific VM. When a VM is created the config spec will be updated
    with the UUID of the VM - that is, the field 'instanceUuid' will be
    set. The search is later done on this field.

    If the search fails then the old code will be invoked - this ensures
    backward compatibility with running VM's. Thus all VM's created
    without the 'instanceUuid' set will not be affected.

    In addition to optimizing the search we also cache the VM reference.
    This ensures that additional calls for the specific VM do not need
    to query the backend for the reference.

    Change-Id: I00d6c29f46b06d082cf3af0369a69147a3376341
    Closes-bug: #1258179

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.