virt-manager fails to delete /var/lib/libvirt/qemu/nvram/VMName_VARS.fd after installation is cancelled.

Bug #1701344 reported by Mike Matera
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
virt-manager (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Hello,

When a VM using UEFI is created through the wizard and the wizard is cancelled in the "customize hardware" step an artifact is left behind in /var/lib/libvirt/qemu/nvram. This caused a permanent inability to create a new VM with the same name in my case.

Steps:

1. Create an x86_64 VM named "TEST123" and make a mistake: Supply AARCH64 firmware. VM startup will fail with an error. Cancel installation and the VM is not in the list.
2. Do step 1 over but give it correct firmware. The VM will fail to startup with a firmware related failure. The exception names the file /var/lib/libvirt/qemu/nvram/TEST124_VARS.fd
3. Delete /var/lib/libvirt/qemu/nvram/TEST124_VARS.fd
4. Repeat step 2. It works.

This could make a person crazy for a while and at first glance appears to be a firmware bug. I was saved by my love of deleting things.

Thank you!

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: virt-manager 1:1.3.2-3ubuntu1.16.04.3
ProcVersionSignature: Ubuntu 4.4.0-79.100-generic 4.4.67
Uname: Linux 4.4.0-79-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.6
Architecture: amd64
CurrentDesktop: Unity
Date: Thu Jun 29 11:46:24 2017
EcryptfsInUse: Yes
InstallationDate: Installed on 2017-05-23 (37 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
PackageArchitecture: all
SourcePackage: virt-manager
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Mike Matera (fatboymaximus) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Very interesting, thank you Mike for reporting that.

I never drove this through virt-mananger, but I've seen simlar through the commandline.

I happened to get it working via:
 $ virsh undefine --nvram testguest

Now maybe the cancel/delete from virt-manager does not do so.

Could you try the following:
1. if the guest is still defined after that cancel undefine it in commandline with
   $ virsh undefine --nvram TEST123
   Check if that did remove the file
2. If the cancel did remove the definition please try to find a way to abort it hard. We need to
   find out if virt-manager needs to call differently or if in your case the "undefine including
   nvram" fails to remove the file.
   Once you aborted do the undefine as mentioned above and check if the file was removed.

If the file was not removed we might also want to check dmesg for apparmor Denies.

Also there is this code in virt-manager:
        if force:
            flags |= getattr(libvirt,
                             "VIR_DOMAIN_UNDEFINE_SNAPSHOTS_METADATA", 0)
            flags |= getattr(libvirt, "VIR_DOMAIN_UNDEFINE_MANAGED_SAVE", 0)
            if (self.get_xmlobj().os.loader_ro is True and
                self.get_xmlobj().os.loader_type == "pflash"):
                flags |= getattr(libvirt, "VIR_DOMAIN_UNDEFINE_NVRAM", 0)
        try:
            self._backend.undefineFlags(flags)

That reads like the right thing, but it depends on the force flag.
Not sure if that is set on the path you trigger.

Changed in virt-manager (Ubuntu):
status: New → Incomplete
Revision history for this message
Mike Matera (fatboymaximus) wrote :

My pleasure. Here's what I tried:

Part 1:
 - Create a VM with the wrong firmware. Try to start it (fails). Abort using the Cancel button.
 - Verify that /var/lib/libvirt/qemu/nvram/TEST123_VARS.fd exists.
 - Run virsh:

$ virsh undefine --nvram TEST123
error: failed to get domain 'TEST123'
error: Domain not found: no domain with matching name 'TEST123'

So the cancel does remove the domain. Move on to part 2.

Part 2:

Looking for ways to "abort it hard" in the GUI: Once the VM has started with the wrong firmware there are three ways out of the customize hardware window:

  1. Begin Installation: Fails and takes you back where you started.
  2. Cancel Installation: Causes the problem reported.
  3. Swatting the window with the "X": Same as #2

So at this point it looks like there's no way to abort the leak of the firmware file. So, I checked for a denial by AppArmor:

$ dmesg -T | tail -n 5
[Tue Jul 11 10:40:18 2017] audit: type=1400 audit(1499794771.816:77): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-239ba97a-d01b-4127-916c-0f07d7c66e0a" pid=2242 comm="apparmor_parser"
[Tue Jul 11 10:40:18 2017] audit: type=1400 audit(1499794771.844:78): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-239ba97a-d01b-4127-916c-0f07d7c66e0a//qemu_bridge_helper" pid=2242 comm="apparmor_parser"
[Tue Jul 11 10:40:18 2017] br0: port 2(vnet0) entered disabled state
[Tue Jul 11 10:40:18 2017] device vnet0 left promiscuous mode
[Tue Jul 11 10:40:18 2017] br0: port 2(vnet0) entered disabled state

Those are not denials. They look like what happens when you create a VM, but I'm not certain.

Next, I checked if using virsh from the command line leaks the NVRAM file:

  1. Create a VM, customize it to use UEFI with the correct arch.
  2. Start the installer and force-off the VM.
  3. Check for deletion:

# ls /var/lib/libvirt/qemu/nvram/
TEST1234_VARS.fd
# virsh undefine --nvram TEST1234
Domain TEST1234 has been undefined
# ls /var/lib/libvirt/qemu/nvram/
TEST1234_VARS.fd

virsh does not delete the file. So it's conclusively an issue with both the command line and the GUI. So looking at the code you provided I went to find the corresponding XML and I found this fragment:

  <os>
    <type arch='x86_64' machine='pc-i440fx-xenial'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <boot dev='hd'/>
  </os>

I've never used XML in Python, so I can't be certain, but it seems like this line:

            if (self.get_xmlobj().os.loader_ro is True and

Should look like this:

            if (self.get_xmlobj().os.loader_readonly is True and

The second line matches the XML attribute exactly.

Hope this helps!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for reporting back.
IMHO it is not the code section:
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
But the nvram section that causes this - here an example I have:
    <nvram template='/usr/share/AAVMF/AAVMF_VARS.fd'>/tmp/testguest-flash1.img</nvram>

That gets me the vars to the specified directory.
And a "virsh undefine --nvram" removes it.

You didn't have that section about the nvram at all, so I removed mine and checked the default (at least in my case).
And I found it creating a file similar to your path:
  /var/lib/libvirt/qemu/nvram/testguest_VARS.fd
But I also found it to delete it correctly.

$ sudo ls -laF /var/lib/libvirt/qemu/nvram/testguest_VARS.fd
ls: cannot access '/var/lib/libvirt/qemu/nvram/testguest_VARS.fd': No such file or directory
$ virsh define testguest-fixed.xml; virsh start testguest
Domain testguest defined from testguest-fixed.xml
Domain testguest started
$ virsh dumpxml testguest | grep VARS
    <nvram>/var/lib/libvirt/qemu/nvram/testguest_VARS.fd</nvram>
$ sudo ls -laF /var/lib/libvirt/qemu/nvram/testguest_VARS.fd
-rw------- 1 libvirt-qemu kvm 67108864 Jul 12 06:55 /var/lib/libvirt/qemu/nvram/testguest_VARS.fd
$ virsh destroy testguest; virsh undefine --nvram testguest
Domain testguest destroyed
Domain testguest has been undefined
$ sudo ls -laF /var/lib/libvirt/qemu/nvram/testguest_VARS.fd
ls: cannot access '/var/lib/libvirt/qemu/nvram/testguest_VARS.fd': No such file or directory

You might only see the nvram section in your XML while it is running as it takes the defaults then.
Therefore - it is the nvram and not the loader section int he xml - I'd assume the readonly attribute is not important.

Still I wonder why it doesn't work for you.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

ah I tried on Artful but you are on Xenial - I have only one box and this one will not work on Xenial.
Do you have a chance to retry if that would for you on Artful and we are facing an issue only broken back in Xenial?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for virt-manager (Ubuntu) because there has been no activity for 60 days.]

Changed in virt-manager (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.