Major performance issue with libvirt related to number of VMs on a host
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Critical
|
Jay Pipes |
Bug Description
Hi All,
We hit a performance issue with KVM on our Diablo based build, which as far as I can see still applies to Essex.
Whilst I have a fix for our baseline, I’m unlikely to produce an Essex based fix in the time that the community would want. As it took quite me quite a while to work out what was happening I’m hoping that by positing the details of the problem and how we’ve addressed it someone else can quickly produce an Essex equivalent. Happy to answer any questions or help out in any other way.
The symptom is that we saw a degradation in VM start-up time on KVM that is directly proportional to the number of VMs already running on the host, and reaches a critical tipping point when that number reaches around 30.
What we found is that there are two places in the compute manager code that make calls to libvirt which take around 2 seconds, and these calls are made for all VMs on the host. These loops do not get pre-empted.
1) During creation of a VM the compute manager checks if a VM with the same name already exists (in Daiblo this was in _run_instance, in Essex its been refactored as _check_
2) The _sync_power_states periodic task which updates the power_state value in the DB.
Note that neither of these gets pre-empted, so the compute manager will block all other processing for ~2 seconds x number_of_VMs. Once the number of VMs reached 30 then the compute manager can block for over 60 seconds, which in turn means that the service update tread will not run in time and the compute service will become un-available (in itself this isn’t a bad thing as it stops the scheduler from sending any more requests to that host).
The Periodic task has the biggest impact as it runs with an interval of 60 seconds – so at minute intervals the compute_manager is blocked for over a minute. Also requests for new VMs not take over a minute to get through the _check_
Its possible that a more efficient way of getting the power state from libvirt can be found (it seems that like me libvirt is build for comfort rather that speed) – but we haven’t found one yet, so instead we took the following approach:
- Refactor _check_
- Modify list_instances_
- Create a separate eventlet to run the _sync_power_state – this is because even with the sleep(0) in place it would otherwise still block other periodic tasks, some of which (such as checks on resource allocation and billing data generation) we want to run every minute even if the _sync_power_state is going to take longer than that to run. This new eventlet now runs less freqeuently – say every 10 minutes. Note that just changing the interval for _sync_power_state isn’t enough, it need to be in a separate eventlet to avoid blocking other periodic tasks. Maybe a more general fix would be to create separate eventlets for every periodic task – rather than just have them run at different intervals in the same eventlet
In seems that in Essex the EC2 API no longer returns power_state as it did in Diablo, so the user experience isn’t affected by the increased refresh interval. However there are some cases in the code that do rely on the value in the DB. In all but one case these can be changed to just refreshes it first from libvirt:
compute/
compute/
compute/
compute/
compute/
compute/
Hope this helps and is clear – as I said happy to answer any questions (<email address hidden>)
Changed in nova: | |
status: | New → Triaged |
importance: | Undecided → Critical |
Changed in nova: | |
milestone: | none → essex-4 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | essex-4 → 2012.1 |
The existing method in _check_ instance_ not_already_ created( ) of looking for a supplied instance name by looping over the return from driver. list_instances( ) is horribly inefficient.
Run the attached script to show how inefficient it is...
On a host with just 10 tiny instances running, doing a simple libvirt. connection. lookupByName( XXX) is an order of magnitude faster than looping through the results of listDomainsID() and looking up the name of the instance using lookupByID():
jpipes@ librebox: ~/repos/ junk$ python test_check_ instance_ name.py
Num running domains: 10
name in [conn.lookupByI D(i).name( ) sID()]
for i in conn.listDomain
Found: False
took 0.00712 seconds
try: conn.lookupByNa me(x) libvirtError:
found = True
except libvirt.
found = False
libvir: QEMU error : Domain not found: no domain with matching name 'instance-1234567'
Found: False
took 0.00083 seconds