Instance crashes when guest topology is being requested

Bug #1408070 reported by Vladik Romanovsky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Vladik Romanovsky

Bug Description

The current version of Libvirt in the gate doesn't return mempages in numa
cells. When requesting guest topology the instance is crashing with
the following:

[instance: 730a2e16-c81c-44b6-b9b5-5c073feff9ce] Instance failed to spawn
Traceback (most recent call last):
  File "/opt/stack/new/nova/nova/compute/manager.py", line 2287, in _build_resources
    yield resources
  File "/opt/stack/new/nova/nova/compute/manager.py", line 2157, in _build_and_run_instance
    flavor=flavor)
  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2389, in spawn
    flavor=flavor)
  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 4063, in _get_guest_xml
    context, flavor=flavor)
  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3837, in _get_guest_config
    instance.numa_topology, guest_numa_config.numatune)
  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3782, in _get_guest_memory_backing_config
    smallest = avail_pagesize[0]
IndexError: list index out of range

Changed in nova:
assignee: nobody → Vladik Romanovsky (vladik-romanovsky)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/145312

Changed in nova:
status: New → In Progress
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

The problem happens if running under a libvirt version less than 1.2.8

Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/145312
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3466e727546e0f7595378b8274254bed913f42ee
Submitter: Jenkins
Branch: master

commit 3466e727546e0f7595378b8274254bed913f42ee
Author: Vladik Romanovsky <email address hidden>
Date: Tue Jan 6 14:23:38 2015 -0500

    libvirt: not setting membacking when mempages are empty host topology

    The current version of Libvirt in the gate doesn't return mempages in numa
    cells. When requesting guest topology the instance is crashing.
    Not setting the guest.membacking when libvirt version is less than the
    required minimum

    Closes-Bug: #1408070

    Change-Id: Ib83062f413bf17e1fbfe2399348c3f5e1e703559

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/159106

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/159106
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=945ab28df04e22f3c1b8948972f538ee6b5e7410
Submitter: Jenkins
Branch: master

commit 945ab28df04e22f3c1b8948972f538ee6b5e7410
Author: Daniel P. Berrange <email address hidden>
Date: Wed Feb 25 10:11:14 2015 +0000

    libvirt: fix disablement of NUMA & hugepages on unsupported platforms

    Two previous commits updated the libvirt XML config generator so
    that it would omit the config elements for huge pages and NUMA
    placement when running on old libvirt:

      commit cf3a1262ecf12f4345326309c273722ebc26b466
      Author: Vladik Romanovsky <email address hidden>
      Date: Wed Jan 28 00:54:26 2015 -0500

        libvirt: avoid setting the memnodes where when it's not a supported option

      commit 3466e727546e0f7595378b8274254bed913f42ee
      Author: Vladik Romanovsky <email address hidden>
      Date: Tue Jan 6 14:23:38 2015 -0500

        libvirt: not setting membacking when mempages are empty host topology

    The problem arising from this is that the hosts are still reporting
    to the schedular that they can support NUMA and huge pages if
    libvirt >= 1.0.4, but we are silently discarding the guest config
    if libvirt < 1.2.7 (numa) or 1.2.8 (huge pages).

    The result is that the schedular thinks the host can support the
    requested feature and so places the guest there, but the actual
    guest that is launched is missing the feature. So the user is not
    getting what they requested.

    The correct approach is to update the "_get_host_numa_topology"
    method so that it does not report any NUMA topology in the first
    place, if the host is incapable of having guests configured in
    the way Nova requires. Likewise if the host cannot support the
    huge page configuration method required, it should not report
    availability of huge pages.

    In summary, we are moving the version checks added by the two
    commits above to the point at which host capability reporting
    is done. We add a fatal error check in the guest XML config
    generator methods, so that if there is some future schedular
    bug causing it to not honour host capabilities, we see a
    clear error instead of silently starting guests with the
    wrong config.

    There is a further mistake in that a version number check alone
    is insufficient. We must also check that the hypervisor is
    either QEMU or KVM, to prevent the code paths running on Xen
    or LXC.

    Closes-bug: #1425115
    Related-bug: #1408070
    Related-bug: #1415333

    Change-Id: I8326596696cbb030ae95d17f3dd430d05279081a

Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-2 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.