nova libvirt driver assumes libvirt support for CPU pinning

Bug #1438226 reported by Stephen Finucane
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Stephen Finucane

Bug Description

CPU pinning support was implemented as part of this blueprint:

    http://specs.openstack.org/openstack/nova-specs/specs/juno/approved/virt-driver-cpu-pinning.html

However, CPU pinning support is broken in some libvirt versions (summarized below), resulting in exceptions when attempting to schedule instances with the 'hw:cpu_policy' flavor key.

We should add a libvirt version test against known broken versions and use that to determine whether or not to support the flavor keys.

This is somewhat related to #1422775 ("nova libvirt driver assumes qemu support for NUMA pinning").

---

# Testing Configuration

Testing was conducted in a container which provided a single-node, Fedora 21-based (3.17.8-300.fc21.x86_64) OpenStack instance (built with devstack). The yum-provided libvirt and its dependencies were removed and libvirt and libvirt-python were built and installed from source.

# Results

The results are as follows:

    versions status
    -------- ------
    1.2.9 ok
    1.2.9.1 ok
    1.2.9.2 fail
    1.2.10 fail
    1.2.11 ok
    1.2.12 ok

v1.2.9.2 is broken by this (backported) patch:

    https://www.redhat.com/archives/libvir-list/2014-November/msg00275.html

This can be seen as commit

    e226772 (qemu: fix domain startup failing with 'strict' mode in numatune)

v1.2.10 inherits is broken at checkout but can be fixed by applying these three patches (yes, one of these broke v1.2.9.2 - the irony is not lost on me):

    [0/3] https://www.redhat.com/archives/libvir-list/2014-November/msg00274.html
     - [1/3] https://www.redhat.com/archives/libvir-list/2014-November/msg00273.html
     - [2/3] https://www.redhat.com/archives/libvir-list/2014-November/msg00276.html
     - [3/3] https://www.redhat.com/archives/libvir-list/2014-November/msg00275.html

# Error logs

v1.2.9.2 produces the following exception:

    Traceback (most recent call last):
      File "/opt/stack/nova/nova/compute/manager.py", line 2301, in _build_resources
        yield resources
      File "/opt/stack/nova/nova/compute/manager.py", line 2171, in _build_and_run_instance
        flavor=flavor)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2357, in spawn
        block_device_info=block_device_info)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4376, in _create_domain_and_network
        power_on=power_on)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4307, in _create_domain
        LOG.error(err)
      File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 82, in __exit__
        six.reraise(self.type_, self.value, self.tb)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4297, in _create_domain
        domain.createWithFlags(launch_flags)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
        result = proxy_call(self._autowrap, f, *args, **kwargs)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
        rv = execute(f, *args, **kwargs)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
        six.reraise(c, e, tb)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
        rv = meth(*args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1029, in createWithFlags
        if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
    libvirtError: Failed to create controller cpu for group: No such file or directory

v1.2.10 produces the following exception:

    Traceback (most recent call last):
      File "/opt/stack/nova/nova/compute/manager.py", line 2342, in _build_resources
        yield resources
      File "/opt/stack/nova/nova/compute/manager.py", line 2215, in _build_and_run_instance
        block_device_info=block_device_info)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2356, in spawn
        block_device_info=block_device_info)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4375, in _create_domain_and_network
        power_on=power_on)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4306, in _create_domain
        LOG.error(err)
      File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
        six.reraise(self.type_, self.value, self.tb)
      File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4296, in _create_domain
        domain.createWithFlags(launch_flags)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
        result = proxy_call(self._autowrap, f, *args, **kwargs)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
        rv = execute(f, *args, **kwargs)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
        six.reraise(c, e, tb)
      File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
        rv = meth(*args, **kwargs)
      File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1037, in createWithFlags
        if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
    libvirtError: Unable to write to '/sys/fs/cgroup/cpuset/system.slice/docker.service/machine.slice/machine-qemu\x2dinstance\x2d0000000a.scope/cpuset.mems': Device or resource busy

Tags: libvirt
Revision history for this message
Sean Dague (sdague) wrote :

Because this should *hopefully* be a small conditional fix, I think it should end up on the RC list.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
milestone: none → kilo-rc1
tags: added: numa
tags: removed: numa
Changed in nova:
assignee: nobody → Radoslaw Smigielski (radoslaw-smigielski)
Revision history for this message
John Garbutt (johngarbutt) wrote :

Seems related to this patch: https://review.openstack.org/#/c/159106/

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Not marking as duplicate because this bug actually raises this problem wrt CPU pinning which is not directly mentioned by other bugs (but is addressed by the patch https://review.openstack.org/#/c/159106/), instead I will make sure the patch references this bug as well

Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Actually looking into this further - it seems that the proposed patch does not cover this. This bug refers to specific versions of libvirt that have been found to be broken.

tags: added: kilo-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/170190

Changed in nova:
assignee: Radoslaw Smigielski (radoslaw-smigielski) → Stephen Finucane (stephen-finucane)
status: Confirmed → In Progress
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Apologies @radoslaw-smigielski for not assigning myself sooner.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/170190
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c380987aa8844a46b2d50e55350e89e0791d76b6
Submitter: Jenkins
Branch: master

commit c380987aa8844a46b2d50e55350e89e0791d76b6
Author: Stephen Finucane <email address hidden>
Date: Tue Mar 31 11:03:48 2015 +0100

    libvirt: Add version check when pinning guest CPUs

    Ensure versions of libvirt with broken CPU pinning support are not used
    for said feature. This requires the addition of a new Exception,
    specific version check functionality and unit tests for same.

    Change-Id: I03b462c4985517ff8a4d94f0e1acae4fabdc5d39
    Closes-Bug: #1438226

Changed in nova:
status: In Progress → Fix Committed
tags: removed: kilo-rc-potential
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-rc1 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/178188
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7ca56106def7950aceecacf40b2ae8de7c846cb2
Submitter: Jenkins
Branch: master

commit 7ca56106def7950aceecacf40b2ae8de7c846cb2
Author: Stephen Finucane <email address hidden>
Date: Thu Apr 23 14:01:10 2015 +0100

    libvirt: Disable NUMA for broken libvirt

    Ensure versions of libvirt with broken NUMA tuning support are not used
    for said feature.

    Change-Id: I6de388f1ca98c1ae16f2968f59881e3b0dba5f8d
    Closes-Bug: #1449028
    Related-Bug: #1438226

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.