Nova assumes hw_firmware_type=uefi being set on UEFI instances

Bug #1868464 reported by Marcin Juszkiewicz
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Stephen Finucane

Bug Description

During queens cycle we set hw_firmware_type to be uefi for aarch64 architecture (as it is the only sane way to boot an instance there).

Since then images do not need to have 'hw_firmware_type' property set to get instances working.

The problem starts when there are issues starting instance for other reasons:

│2020-03-21 19:35:04.451 6 ERROR nova.compute.manager [req-9a494e07-cfd7-49ca-87d2-bbe3ac86bc2c 9f732df1c71f4788b1b834c07511bc53 acc9e58c6c514ac1af7d3c28e9d690ee - default default] [instance: 09554b80-bb49-47bb-bbd3-36d651ab4655] Instance failed to spawn: libvirt.libvirtError: Requested operation is not valid: cannot undefine domain with nvram

I went through code and found out that when machine starts on aarch64 then hw_firmware_type=UEFI is set in code by default (since queens). It is not stored in image metadata at all (when image is checked with 'image show IMAGENAME').

But when I start such image and it fails to start then it looks like nova does not set flag to remove nvram. I looked at libvirt/driver.py and it looks like it is assumed that instance/image will have hw_firmware_type=UEFI set ;(

When I added hw_firmware_type=uefi to the image the problem was gone. But it is not proper solution.

Something like this probably needs to be done in all places where it is checked:

diff --git nova/virt/libvirt/driver.py nova/virt/libvirt/driver.py
index 45af21c3bb..0f293bbea8 100644
--- nova/virt/libvirt/driver.py
+++ nova/virt/libvirt/driver.py
@@ -1270,11 +1270,7 @@ class LibvirtDriver(driver.ComputeDriver):
         try:
             guest = self._host.get_guest(instance)
             try:
- hw_firmware_type = instance.image_meta.properties.get(
- 'hw_firmware_type')
- support_uefi = (self._has_uefi_support() and
- hw_firmware_type == fields.FirmwareType.UEFI)
- guest.delete_configuration(support_uefi)
+ guest.delete_configuration(self._has_uefi_support())
             except libvirt.libvirtError as e:
                 with excutils.save_and_reraise_exception() as ctxt:
                     errcode = e.get_error_code()

Or maybe set NVRAM flag each time as 'hw_firmware_type' can not be assumed to be set.

Tags: compute
Revision history for this message
Marcin Juszkiewicz (hrw) wrote :

I am waiting for a day when x86 virtual world will finally move from i440fx/bios to q35/uefi and gets hit by all ignored issues.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714311

Changed in nova:
assignee: nobody → Marcin Juszkiewicz (hrw)
status: New → In Progress
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Re comment #1 - I guess it would be interesting to have and use a switch defaulting nova to UEFI and modern platform (pcie, I'm looking at you) even on x86_64.

Changed in nova:
assignee: Marcin Juszkiewicz (hrw) → Kevin Zhao (kevin-zhao)
Changed in nova:
assignee: Kevin Zhao (kevin-zhao) → Stephen Finucane (stephenfinucane)
Changed in nova:
importance: Undecided → Low
tags: added: compute
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/714311
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a18dbb5670da0c0b63c4bd69055262b4b7b17e58
Submitter: Zuul
Branch: master

commit a18dbb5670da0c0b63c4bd69055262b4b7b17e58
Author: Marcin Juszkiewicz <email address hidden>
Date: Mon Mar 23 12:07:02 2020 +0100

    libvirt: Change UEFI check to handle AArch64 better

    Nova assumes that images run on UEFI instances will have
    'hw_firmware_type' property set. This is wrong assumption because UEFI
    is the default on AArch64, meaning images do not need to have this set.
    This results in a failure to remove the instance, with libvirt raising
    the following error:

      Instance failed to spawn: libvirt.libvirtError: Requested operation
      is not valid: cannot undefine domain with nvram

    Resolve this by checking *both* the image metadata property and the
    machine arch.

    Change-Id: I2956fe2e3582c36d1c52a7e3becde1dacd9d41f0
    Closes-bug: #1868464

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.