'hw:realtime_mask' extra spec is not validated

Bug #1884231 reported by Stephen Finucane
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Stephen Finucane

Bug Description

The 'hw:realtime_mask' extra spec is (currently) used to specify what cores in a host should *not* be part of the realtime set of cores on the host. Currently, this is mandatory and omitting it will cause a HTTP 400 error. For example:

  $ openstack flavor create --ram 512 --disk 1 --vcpus 2 \
      --property hw:cpu_policy=dedicated
      --property hw:cpu_realtime=yes \
      test.rt

will fail with:

  Realtime policy needs vCPU(s) mask configured with at least 1 RT vCPU and 1 ordinary vCPU. See hw:cpu_realtime_mask or hw_cpu_realtime_mask

Similarly, attempting to mask *all* values will result in a failure. For example:

  $ openstack flavor create --ram 512 --disk 1 --vcpus 2 \
      --property hw:cpu_policy=dedicated
      --property hw:cpu_realtime=yes \
      --property hw:cpu_realtime_mask=^0-1
      test.rt

will also fail with:

  Realtime policy needs vCPU(s) mask configured with at least 1 RT vCPU and 1 ordinary vCPU. See hw:cpu_realtime_mask or hw_cpu_realtime_mask

However, the value is otherwise unvalidated by nova, which can cause libvirt to explode when specific values are passed. For example, consider the following flavor:

  $ openstack flavor create --ram 512 --disk 1 --vcpus 2 \
      --property hw:cpu_policy=dedicated
      --property hw:cpu_realtime=yes \
      --property hw:cpu_realtime_mask='^2' \
      test.rt

This states that the instances should have two cores, and some imaginary third core (masks are 0-indexed) will be the non-realtime one. This is clearly nonsensical and, surely enough, creating an instance using this core causes things to go bang:

  Failed to build and run instance: libvirt.libvirtError: invalid argument: Failed to parse bitmap ''
  Traceback (most recent call last):
    File "/opt/stack/nova/nova/compute/manager.py", line 2378, in _build_and_run_instance
      accel_info=accel_info)
    File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3702, in spawn
      cleanup_instance_disks=created_disks)
    File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6664, in _create_domain_and_network
      cleanup_instance_disks=cleanup_instance_disks)
    File "/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line220, in __exit__
      self.force_reraise()
    File "/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line196, in force_reraise
      six.reraise(self.type_, self.value, self.tb)
    File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
      raise value
    File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6633, in _create_domain_and_network
      post_xml_callback=post_xml_callback)
    File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6559, in _create_domain
      guest = libvirt_guest.Guest.create(xml, self._host)
    File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 127, in create
      encodeutils.safe_decode(xml))
    File "/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line220, in __exit__
      self.force_reraise()
    File "/usr/local/lib/python3.7/site-packages/oslo_utils/excutils.py", line196, in force_reraise
      six.reraise(self.type_, self.value, self.tb)
    File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
      raise value
    File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 123, in create
      guest = host.write_instance_config(xml)
    File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1141, in write_instance_config
      domain = self.get_connection().defineXML(xml)
    File "/usr/local/lib/python3.7/site-packages/eventlet/tpool.py", line 190,in doit
      result = proxy_call(self._autowrap, f, *args, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/eventlet/tpool.py", line 148, in proxy_call
      rv = execute(f, *args, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/eventlet/tpool.py", line 129, in execute
      six.reraise(c, e, tb)
    File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise
      raise value
    File "/usr/local/lib/python3.7/site-packages/eventlet/tpool.py", line 83, in tworker
      rv = meth(*args, **kwargs)
    File "/usr/local/lib64/python3.7/site-packages/libvirt.py", line 4048, in defineXML
      if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
  libvirt.libvirtError: invalid argument: Failed to parse bitmap ''

The error happens because libvirt is attempting to configure the set CPUs on which to pin emulators threads, which in the realtime case are all the non-realtime cores. However, since there are no cores set aside for non-realtime purposes - due to the invalid mask - we end up with an empty emulator thread set [1]. One *could* work around this by configuring an emulator thread policy. For example:

  openstack flavor create --ram 512 --disk 1 --vcpus 2 \
    --property 'hw:cpu_policy=dedicated' \
    --property 'hw:emulator_threads_policy=isolate' \
    --property 'hw:cpu_realtime=true' \
    --property 'hw:cpu_realtime_mask=^2' \
    test.rt

Similarly, they could ensure at least one core in the range is valid:

  openstack flavor create --ram 512 --disk 1 --vcpus 2 \
    --property 'hw:cpu_policy=dedicated' \
    --property 'hw:emulator_threads_policy=isolate' \
    --property 'hw:cpu_realtime=true' \
    --property 'hw:cpu_realtime_mask=^1-5' \
    test.rt

However, both cases are still wrong and the 'hw:cpu_realtime_mask' value is almost certainly user error. Nova should be validating things properly and rejecting invalid values. we could probably also look at dropping the requirement to specify 'hw:cpu_realtime_mask' if 'hw:emulator_threads_policy' is configured, however, that's more of a feature than a bug.

Tags: libvirt
Changed in nova:
importance: Undecided → Medium
assignee: nobody → Stephen Finucane (stephenfinucane)
status: New → Confirmed
description: updated
tags: added: libvi
tags: added: libvirt
removed: libvi
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/468203
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b06597418799b382d091bd2ff4b747e6f5a10d96
Submitter: Zuul
Branch: master

commit b06597418799b382d091bd2ff4b747e6f5a10d96
Author: Chris Friesen <email address hidden>
Date: Thu May 25 17:13:22 2017 -0600

    hardware: Add validation for 'cpu_realtime_mask'

    It's possible to specify some strange values for the cpu realtime mask
    and the code won't currently complain, so let's make it a bit more
    strict.

    Change-Id: I0ba06529affe5b48af5ac37bc24242dffdac77d3
    Closes-Bug: #1884231

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.