Libvirt get_available_resource is reporting incorrect vcpus_used data for QEMU/LXC instances

Bug #1638889 reported by Daniel Berrange
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Stephen Finucane

Bug Description

Currently if Nova is using the libvirt LXC driver, it is hardcoded to report 1 vCPU used on the host, regardless of how many containers are running.

Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info method is throwing an exception, since QEMU does not use a dedicated thread per vCPU currently. The effect is that on QEMU hosts, we're reporting 0 vCPUs used on the host regardless of how many guests are running

This causes the 'get_available_resources' method to report incorrect 'vcpus_used' values for the compute node:

eg with 2 instances running:

$ nova list
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+
| deee00d9-3903-43aa-aa33-40e869b61bf6 | demo1 | ACTIVE | - | Running | private=10.0.0.4, 2001:db8:8000:0:f816:3eff:fe8f:135d |
| 3d160f7c-18fb-4c62-8464-5477be7432d0 | demo2 | ACTIVE | - | Running | private=10.0.0.13, 2001:db8:8000:0:f816:3eff:fef6:58d9 |
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+

We're correctly recording that 2 vCPUs are used against the compute node

$ nova hypervisor-show 1 | grep vcpus
| vcpus | 12 |
| vcpus_used | 2 |

but when reporting the hypervisors view of available vCPUs the value never lowers from 12. eg it should be reporting 10, but it reports 12:

$ grep 'Hypervisor: free VCPUs' ../logs/n-cpu.log | tail
2016-11-03 11:17:24.003 19647 DEBUG nova.compute.resource_tracker [req-559dcffd-b4c8-494b-b1f2-f936346132cf - -] Hypervisor: free VCPUs: 12 _report_hypervisor_resource_view /home/ds-f23-master/openstack/nova/nova/compute/resource_tracker.py:623

The resource tracker ignores the vcpus_used value reported by the hypervisor (which is arguably a bug in itself, because it causes it to incorrectly over-count QEMU CPU usage), but at least it means it is not affected by this libvirt bug - it merely causes mis-leading log messages to be emitted. None the less we should fix the libvirt reporting so that it is possible to have resource tracker honour this data in the future.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/393254

Changed in nova:
assignee: nobody → Daniel Berrange (berrange)
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Changed in nova:
assignee: Daniel Berrange (berrange) → Stephen Finucane (stephenfinucane)
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → Michael Still (mikal)
Changed in nova:
assignee: Michael Still (mikal) → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/393254
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2fdab3b922b0d99f415902462de967a910a6594b
Submitter: Jenkins
Branch: master

commit 2fdab3b922b0d99f415902462de967a910a6594b
Author: Daniel P. Berrange <email address hidden>
Date: Wed Nov 2 14:46:06 2016 +0000

    libvirt: fix vCPU usage reporing for LXC/QEMU guests

    Currently if Nova is using the libvirt LXC driver, it is
    hardcoded to report 1 vCPU used on the host, regardless
    of how many containers are running.

    Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info
    method is throwing an exception, since QEMU does not use
    a dedicated thread per vCPU currently. The effect is that
    on QEMU hosts, we're reporting 0 vCPUs used on the host
    regardless of how many guests are running

    This causes the 'get_available_resources' method to report
    incorrect 'vcpus_used' values for the compute node. By a
    stroke of luck, the resource tracker merely logs this
    value and then throws it away, instead counting vcpu
    usage based on vcpus declared against the flavour. Now
    ignoring the hypervisor reported data is arguably a bug
    in the resource tracker, because it means it is overcounting
    resource consumption for plain QEMU guests (they can only
    ever consume 1 pCPU of time, regardless of vCPU count).
    Fixing the resource tracker is out of scope for now, but
    we should at least ensure we're reporting accurate data
    to it, even if it is only used for logging at this time.

    If a host does not report detailed vCPU usage from libvirt
    then we should default to reporting 1 vCPU per guest, so
    that the 'vcpus_used' field reports some reasonably
    meaningful data on host CPU usage.

    Closes-bug: #1638889
    Change-Id: I627d30d61f8ead6211f78a1c79ffd79b81333f86

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0rc1

This issue was fixed in the openstack/nova 15.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.