nova delete doesn't work with EFI booted VMs

Bug #1567807 reported by Brad Marshall on 2016-04-08
52
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Derek Higgins
Ubuntu Cloud Archive
High
Unassigned
Mitaka
High
Unassigned
Newton
High
Unassigned
Ocata
High
Unassigned
nova (Ubuntu)
Medium
Kevin Zhao
Xenial
Medium
Chuck Short
Yakkety
High
Unassigned
Zesty
Medium
Kevin Zhao

Bug Description

I've been setting up a Mitaka Openstack using the cloud archive running on Trusty, and am having problems working with EFI enabled instances on ARM64.

I've done some work with wgrant and gotten things to a stage where I can boot instances, using the aavmf images.

However, when I tried to delete a VM booted like this, I get an error:

  libvirtError: Requested operation is not valid: cannot delete inactive domain with nvram

I've included the full traceback at https://paste.ubuntu.com/15682718/.

Thanks to a suggestion from wgrant again, I got it working by editing nova/virt/libvirt/guest.py in delete_configuration() and replacing self._domain.undefineFlags(libvirt.VIR_DOMAIN_UNDEFINE_MANAGED_SAVE) with self._domain.undefineFlags(libvirt.VIR_DOMAIN_UNDEFINE_MANAGED_SAVE | libvirt.VIR_DOMAIN_UNDEFINE_NVRAM).
I've attached a rough patch.

Once that's applied and nova-compute restarted, I was able to delete the instance fine.

Could someone please investigate this and see if its the correct fix, and look at getting it fixed in the archive?

This was done on a updated trusty deployment using the cloud-archives for mitaka.

$ dpkg-query -W python-nova
python-nova 2:13.0.0~b2-0ubuntu1~cloud0

Please let me know if you need any further information.

Brad Marshall (brad-marshall) wrote :
William Grant (wgrant) wrote :

To get an arm64 UEFI instance running, grab libvirt (minor apparmor patch) and aavmf (UEFI firmware) from https://launchpad.net/~wgrant/+archive/ubuntu/scalingstack-trusty-mitaka/, then boot a uefi1 arm64 image with the hw_firmware_type=uefi glance property set. But it's probably reproducible with amd64 UEFI instances too.

nova creates a libvirt instance with <loader> and <nvram> elements, and libvirt refuses to implicitly delete the nvram file. nova should probably be taught to force that, which is what my suggested delete_configuration patch does.

Michael Still (mikal) wrote :

Your fix looks reasonable to me, but we'd need a test as well I think. I've tagged this as being of possible backport interest to mitaka as well.

Changed in nova (Ubuntu):
status: New → Confirmed
Changed in nova:
status: New → Triaged
importance: Undecided → Low
tags: added: libvirt mitaka-backport-potential

My understanding is that using the VIR_DOMAIN_UNDEFINE_NVRAM flag[1] you are passing the --nvram argument and according to the documentation for undefine method this argument[2] is used for remove the nvram file. I still need to check the process to create VMs, especially when it uses uefi and checks if that a copy of nvram is created for every new instance. So, my question is if once that this change is applied is possible to create a new instance?

[1] https://github.com/libvirt/libvirt/blob/d9a0a885e2b1cf3c9fc5260f9cdf4fc8a768f26c/tools/virsh-domain.c#L3681-L3682
[2] https://github.com/libvirt/libvirt/blob/d9a0a885e2b1cf3c9fc5260f9cdf4fc8a768f26c/tools/virsh-domain.c#L3605

William Grant (wgrant) wrote :

Yes, it is still possible to create an instance after changing only the delete method. The generated libvirt domain XML looks like this:

  <os>
    <type arch='aarch64' machine='virt'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram template='/usr/share/AAVMF/AAVMF_CODE.fd'>/var/lib/libvirt/qemu/nvram/instance-00000005_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>

That is, the nvram file is copied from a global template to an instance-specific file -- it's that file under /var/lib/libvirt/qemu/nvram that VIR_DOMAIN_UNDEFINE_NVRAM removes. It should arguably be using template=AAVMF_VARS.fd rather than AAVMF_CODE.fd, but that's unrelated to this and doesn't really matter.

The most annoying part of this change is feature detection. I think delete_configuration might have to know to try unsetting flags until the method works, as older libvirts don't support VIR_DOMAIN_UNDEFINE_NVRAM and I can't see a way to test at runtime. But in our environment we know the libvirt version, so we just pass the flag in unconditionally.

Sean Dague (sdague) wrote :

Could you propose that up as a gerrit review? We don't do patches in the tracker.

James Page (james-page) on 2016-06-09
Changed in nova (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Low
Raghuram Kota (rkota) on 2016-06-13
tags: added: arm64 hs-arm64
Ching Kuo (gene-kuo) on 2016-06-29
Changed in nova:
assignee: nobody → Ching Kuo (gene-kuo)
Changed in nova (Ubuntu):
assignee: nobody → Ching Kuo (gene-kuo)

Fix proposed to branch: master
Review: https://review.openstack.org/335512

Changed in nova:
status: Triaged → In Progress
Kevin Zhao (kevin-zhao) wrote :

Hi Ching Kuo,
   Are you still working on this?

Ching Kuo (gene-kuo) wrote :

Hi Kevin,

Still working on this bug. Trying to find out why Xen and Hyper-V CI failed.

Kevin Zhao (kevin-zhao) wrote :

Hi Ching,
   How is that patch going?
   We've need this fix emergency. If you not working on this, I'll help to fix this bug.
   Thanks~

Ching Kuo (gene-kuo) wrote :

Hi Kevin,

I've encountered some problem when running Xen and Hyper-V CI.
If you have any good ideas maybe we can discuss and fix it together.

Thanks

Kevin Zhao (kevin-zhao) wrote :

Hi Ching,
    I have told this issue to Daniel and he has gave some comments on your patch. I think maybe this will help to fix the XenCI problem. Also since the libvirt version that lower 1.2.8 can not recognize the nvram parameters,you need to more about UNDEFINE_NVRAM

Ching Kuo (gene-kuo) wrote :

Hi Kevin,

I also found out that maybe the problem. I'm going to fix it soon.
Thanks for the suggestion.

Marcos Simental (mrkzmrkz) wrote :

Hello Ching,
The workaround we did for Clear Linux about this was something like:

diff --git a/nova/virt/libvirt/guest.py b/nova/virt/libvirt/guest.py
index 263f873..1df0476 100644
--- a/nova/virt/libvirt/guest.py
+++ b/nova/virt/libvirt/guest.py
@@ -199,8 +199,13 @@ class Guest(object):
     def delete_configuration(self):
         """Undefines a domain from hypervisor."""
         try:
- self._domain.undefineFlags(
- libvirt.VIR_DOMAIN_UNDEFINE_MANAGED_SAVE)
+ if '<nvram template=' in self._domain.XMLDesc():
+ self._domain.undefineFlags(
+ libvirt.VIR_DOMAIN_UNDEFINE_NVRAM |
+ libvirt.VIR_DOMAIN_UNDEFINE_MANAGED_SAVE)
+ else:
+ self._domain.undefineFlags(
+ libvirt.VIR_DOMAIN_UNDEFINE_MANAGED_SAVE)
         except libvirt.libvirtError:
             LOG.debug("Error from libvirt during undefineFlags. %d"
                       "Retrying with undefine", self.id)

That way the `VIR_DOMAIN_UNDEFINE_NVRAM` flag will be called only if the instance was booted using uefi (NVRAM)

As you can check in our nova src.rpm in https://download.clearlinux.org/releases/7310/clear/source/SRPMS/nova-12.0.2-133.src.rpm
( you could extract the files by running
$ rpm2cpio https://download.clearlinux.org/releases/7310/clear/source/SRPMS/nova-12.0.2-133.src.rpm | cpio -divm
and then take a look at the 0006-Enable-UEFI-boot-for-kvm-and-qemu.patch )

Kevin Zhao (kevin-zhao) on 2016-08-23
Changed in nova:
assignee: Ching Kuo (gene-kuo) → Kevin Zhao (kevin-zhao)
Kevin Zhao (kevin-zhao) on 2016-08-24
Changed in nova (Ubuntu):
assignee: Ching Kuo (gene-kuo) → Kevin Zhao (kevin-zhao)
Kevin Zhao (kevin-zhao) wrote :

Just to remind that the bug fix is:
https://review.openstack.org/#/c/357190/
It's strange that it can't be attached here automatically.

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/335512
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

Changed in nova:
assignee: Kevin Zhao (kevin-zhao) → Derek Higgins (derekh)
cargonza (cargonza) wrote :

Hi, any update on the fix being merge soon?

Thanks,
Carlos

Kevin Fox (kevpn) wrote :

+1

Chuck Short (zulcss) wrote :

Linking to new bug so we can track this:

https://review.openstack.org/#/c/357190/

Ryan Beisner (1chb1n) on 2017-01-18
tags: added: uosci
Kevin Zhao (kevin-zhao) wrote :

https://review.openstack.org/#/c/357190/

The patch set has been set to "workflow +1" but never merge yet?
Is there any need before merging?

Ryan Beisner (1chb1n) wrote :
Download full text (5.4 KiB)

FWIW, confirmed issue on Xenial-Mitaka aarch64:

$ openstack server list
+--------------------------------------+-----------------------------+---------+------------------------------------+------------+
| ID | Name | Status | Networks | Image Name |
+--------------------------------------+-----------------------------+---------+------------------------------------+------------+
| 58d20cf7-a5c4-4845-8f26-47222513dfba | xenial-uefi-20170119b221218 | ACTIVE | private=172.16.0.15, 10.111.222.70 | |
| 3fd7e3ae-c710-44d2-9a81-3077ed89b196 | xenial-uefi-20170119b221212 | ACTIVE | private=172.16.0.14, 10.111.222.67 | |
| d1a8d761-44e4-464f-8056-bf58bc8b2407 | xenial-uefi-20170119b221205 | ACTIVE | private=172.16.0.13, 10.111.222.66 | |
| df488910-3151-43c1-9de9-794d5095b66d | xenial-uefi-20170119212539 | SHUTOFF | private=172.16.0.12, 10.111.222.65 | |
| d7a46c4b-2b63-4a8d-9ab7-dfcbf9e7032f | xenial-uefi-20170119212534 | SHUTOFF | private=172.16.0.11, 10.111.222.69 | |
| 5d1275ef-9346-4598-8654-10a4b0b8da47 | xenial-uefi-20170119212528 | SHUTOFF | private=172.16.0.10, 10.111.222.68 | |
+--------------------------------------+-----------------------------+---------+------------------------------------+------------+

.

$ for i in $(openstack server list | grep uefi | awk '{ print $2 }'); do echo $i; openstack server delete $i; done
58d20cf7-a5c4-4845-8f26-47222513dfba
3fd7e3ae-c710-44d2-9a81-3077ed89b196
d1a8d761-44e4-464f-8056-bf58bc8b2407
df488910-3151-43c1-9de9-794d5095b66d
d7a46c4b-2b63-4a8d-9ab7-dfcbf9e7032f
5d1275ef-9346-4598-8654-10a4b0b8da47

.

$ openstack server list
+--------------------------------------+-----------------------------+--------+----------+------------+
| ID | Name | Status | Networks | Image Name |
+--------------------------------------+-----------------------------+--------+----------+------------+
| 58d20cf7-a5c4-4845-8f26-47222513dfba | xenial-uefi-20170119b221218 | ERROR | | |
| 3fd7e3ae-c710-44d2-9a81-3077ed89b196 | xenial-uefi-20170119b221212 | ERROR | | |
| d1a8d761-44e4-464f-8056-bf58bc8b2407 | xenial-uefi-20170119b221205 | ERROR | | |
| df488910-3151-43c1-9de9-794d5095b66d | xenial-uefi-20170119212539 | ERROR | | |
+--------------------------------------+-----------------------------+--------+----------+------------+

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2510, in do_terminate_instance
    self._delete_instance(context, instance, bdms, quotas)
  File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 154, in inner
    rv = f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2473, in _delete_instance
    quotas.rollback()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in f...

Read more...

Reviewed: https://review.openstack.org/357190
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=539d381434ccadcdc3f5d58c2705c35558a3a065
Submitter: Jenkins
Branch: master

commit 539d381434ccadcdc3f5d58c2705c35558a3a065
Author: Kevin Zhao <email address hidden>
Date: Thu Jan 5 21:32:41 2017 +0000

    libvirt: fix nova can't delete the instance with nvram

    Currently libvirt needs a flag when deleting an VM with a nvram file,
    without which nova can't delete an instance booted with UEFI. Add
    deletion flag for NVRAM. Also add a test case.

    Co-authored-by: Derek Higgins <email address hidden>
    Change-Id: I46baa952b6c3a1a4c5cf2660931f317cafb5757d
    Closes-Bug: #1567807

Changed in nova:
status: In Progress → Fix Released
Corey Bryant (corey.bryant) wrote :

Chuck has started the backport to stable/newton upstream. We won't be able to get the patch backported upstream to stable/mitaka at this point since they're only accepting critical/security fixes at this time. The patch appears to apply cleanly to stable/mitaka.

Changed in nova (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in nova (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in nova (Ubuntu Zesty):
importance: Low → Medium
Changed in nova (Ubuntu Yakkety):
status: New → Triaged
importance: Medium → High
Corey Bryant (corey.bryant) wrote :

So we'll have to carry the patch in our mitaka package.

This issue was fixed in the openstack/nova 15.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/428314
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bf6c5ba86509c1e162b1f9d23c7cab603abce914
Submitter: Jenkins
Branch: stable/newton

commit bf6c5ba86509c1e162b1f9d23c7cab603abce914
Author: Kevin Zhao <email address hidden>
Date: Thu Jan 5 21:32:41 2017 +0000

    libvirt: fix nova can't delete the instance with nvram

    Currently libvirt needs a flag when deleting an VM with a nvram file,
    without which nova can't delete an instance booted with UEFI. Add
    deletion flag for NVRAM. Also add a test case.

    Co-authored-by: Derek Higgins <email address hidden>
    Change-Id: I46baa952b6c3a1a4c5cf2660931f317cafb5757d
    Closes-Bug: #1567807
    (cherry picked from commit 539d381434ccadcdc3f5d58c2705c35558a3a065)

This issue was fixed in the openstack/nova 14.0.4 release.

Chuck Short (zulcss) on 2017-03-01
Changed in nova (Ubuntu Xenial):
assignee: nobody → Chuck Short (zulcss)
Changed in nova (Ubuntu Xenial):
status: Triaged → Incomplete
James Page (james-page) wrote :

Marking zesty and Ocata UCA tasks as fix committed as included in 15.0.0~rc1

Changed in nova (Ubuntu Zesty):
status: Triaged → Fix Released
James Page (james-page) wrote :

The fix for this is included in nova 14.0.4 which is currently in yakkety-proposed; however its not referenced in the changelog so I'll mark tasks related to Yakkety/Newton as Fix Committed.

Changed in nova (Ubuntu Yakkety):
status: Triaged → Fix Committed
James Page (james-page) wrote :

SRU information for Mitaka/Xenial (and Newton/Yakkety):

[Impact]
Users of EFI enabled instances on the arm64 architecture are unable to destroy running instances on OpenStack Clouds

[Test Case]
boot an instance on a arm64 based OpenStack cloud using an EFI enabled image
nova delete <instance-uuid> - fails

[Regression Potential]
Fix in latest OpenStack release and backported upstream to the Newton release; Mitaka is closed for anything other than security/critical bug fixes upstream so holding a patch in packaging for Xenial. Overall low potential for regression.

James Page (james-page) on 2017-03-15
Changed in nova (Ubuntu Xenial):
status: Incomplete → Triaged

Hello Brad, or anyone else affected,

Accepted nova into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:13.1.3-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed
James Page (james-page) wrote :

Hello Brad, or anyone else affected,

Accepted nova into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Ryan Beisner (1chb1n) wrote :

Confirmed fixed @ xenial-newton for nova 14.0.4-0ubuntu1.2~cloud0.

Ryan Beisner (1chb1n) wrote :

Confirmed fixed @ xenial-mitaka for nova 13.1.3-0ubuntu1.

James Page (james-page) on 2017-03-24
tags: added: verification-done verification-mitaka-done
removed: verification-mitaka-needed verification-needed

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2:13.1.3-0ubuntu1

---------------
nova (2:13.1.3-0ubuntu1) xenial; urgency=medium

  * New upstream point release for OpenStack Mitaka. (LP: #1668313)
  * d/patches/uefi-delete-instances.patch: Fix deletion of instances
    with UEFI is enabled. (LP: #1567807)

 -- Chuck Short <email address hidden> Wed, 01 Mar 2017 08:44:03 -0500

Changed in nova (Ubuntu Xenial):
status: Fix Committed → Fix Released
James Page (james-page) wrote :

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

James Page (james-page) wrote :

This bug was fixed in the package nova - 2:13.1.3-0ubuntu1~cloud0
---------------

 nova (2:13.1.3-0ubuntu1~cloud0) trusty-mitaka; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 nova (2:13.1.3-0ubuntu1) xenial; urgency=medium
 .
   * New upstream point release for OpenStack Mitaka. (LP: #1668313)
   * d/patches/uefi-delete-instances.patch: Fix deletion of instances
     with UEFI is enabled. (LP: #1567807)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers