"mixed" policy calculations don't account for host cells with no shared CPU allocation

Bug #1898272 reported by Stephen Finucane
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Stephen Finucane

Bug Description

The 'mixed' CPU policy allows us to use both shared and dedicated CPUs (VCPU and PCPU) in the same instance. The expectation is that the both sets of CPUs will use host cores from the same NUMA node(s). The current code does appear to be doing this, at least for single NUMA nodes, however, it does not account for NUMA nodes without any shared CPUs.

# Steps to reproduce

Configure a dual NUMA node host so that all cores from one node are assigned to '[compute] cpu_shared_set', while all the cores from the other node are assigned to '[compute] cpu_dedicated_set'. For example, on a host where cores 0-5 are on node 0, while cores 6-11 are on node 1:

  [compute]
  cpu_shared_set = 0-5
  cpu_dedicated_set = 6-11

 Now attempt to boot a guest using the mixed policy, e.g.

  $ openstack flavor create --vcpu 4 --ram 512 --disk 1 \
      --property 'hw:cpu_policy=mixed' --property 'hw:cpu_dedicated_mask=^0' \
      test.mixed
  $ openstack server create --os-compute-api-version=2.latest \
      --flavor test.mixed --image cirros-0.5.1-x86_64-disk --nic none --wait \
      test-server

# Expected result

The instance should fail to schedule as the 'NUMATopologyFilter' should reject the host.

# Actual result

The instance is scheduled but fails to boot since the following invalid XML snippet is generated:

  <cputune>
    <shares>4096</shares>
    <emulatorpin cpuset="0-1,4"/>
    <vcpupin vcpu="0" cpuset=""/> # <--- here
    <vcpupin vcpu="1" cpuset="0"/>
    <vcpupin vcpu="2" cpuset="1"/>
    <vcpupin vcpu="3" cpuset="4"/>
  </cputune>

This results in the following traceback in the nova-compute logs.

  ERROR nova.compute.manager [instance: ...] Traceback (most recent call last):
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/compute/manager.py", line 2625, in _build_resources
  ERROR nova.compute.manager [instance: ...] yield resources
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/compute/manager.py", line 2398, in _build_and_run_instance
  ERROR nova.compute.manager [instance: ...] accel_info=accel_info)
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3752, in spawn
  ERROR nova.compute.manager [instance: ...] cleanup_instance_disks=created_disks)
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6749, in _create_guest_with_network
  ERROR nova.compute.manager [instance: ...] cleanup_instance_disks=cleanup_instance_disks)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  ERROR nova.compute.manager [instance: ...] self.force_reraise()
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, self.value, self.tb)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.compute.manager [instance: ...] raise value
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6718, in _create_guest_with_network
  ERROR nova.compute.manager [instance: ...] post_xml_callback=post_xml_callback)
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6643, in _create_guest
  ERROR nova.compute.manager [instance: ...] guest = libvirt_guest.Guest.create(xml, self._host)
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 145, in create
  ERROR nova.compute.manager [instance: ...] encodeutils.safe_decode(xml))
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  ERROR nova.compute.manager [instance: ...] self.force_reraise()
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  ERROR nova.compute.manager [instance: ...] six.reraise(self.type_, self.value, self.tb)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.compute.manager [instance: ...] raise value
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 141, in create
  ERROR nova.compute.manager [instance: ...] guest = host.write_instance_config(xml)
  ERROR nova.compute.manager [instance: ...] File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1144, in write_instance_config
  ERROR nova.compute.manager [instance: ...] domain = self.get_connection().defineXML(xml)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit
  ERROR nova.compute.manager [instance: ...] result = proxy_call(self._autowrap, f, *args, **kwargs)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call
  ERROR nova.compute.manager [instance: ...] rv = execute(f, *args, **kwargs)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute
  ERROR nova.compute.manager [instance: ...] six.reraise(c, e, tb)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.compute.manager [instance: ...] raise value
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, in tworker
  ERROR nova.compute.manager [instance: ...] rv = meth(*args, **kwargs)
  ERROR nova.compute.manager [instance: ...] File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 3703, in defineXML
  ERROR nova.compute.manager [instance: ...] if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
  ERROR nova.compute.manager [instance: ...] libvirt.libvirtError: invalid argument: Failed to parse bitmap ''

Tags: libvirt numa
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Stephen Finucane (stephenfinucane)
tags: added: libvirt numa
summary: - "mixed" policy calculations don't account for host cells with no free
- shared CPUs
+ "mixed" policy calculations don't account for host cells with no shared
+ CPU allocation
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/756100

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/756101

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.