Use changed nested VMX attribute as trigger to refresh libvirt capability cache
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Invalid
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Cosmic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[impact]
libvirt caches the 'nested vmx' capability of the host and does not update that even if the host's capability to handle nested vmx changes. Having this domcapability missing means no guests are able to start any nested, kvm-accelerated, guests. Additionally, since openstack live migration requires matching cpu features, this makes migrating guests that do have vmx enabled impossible to hosts where libvirt thinks nested vmx is disabled.
Once the kernel module (kvm_intel) is reloaded with 'nested' enabled, libvirt does not update its domcapabilities cache, even over a libvirtd restart, or even over an entire system reboot. Only certain conditions cause libvirt to update its capabilities cache (possibly libvirt upgrade, or qemu upgrade, or kernel upgrade...I haven't verified any of those yet)
libvirt creates caches for its domcapabilities at /var/cache/
removing the cache xml files there and restarting libvirtd will cause the caches to be recreated with the correct current values.
The fix backports the upstream fix:
https:/
Which makes it always check the current vs the last stored attribute.
[test case]
check the kvm_intel module nested parameter:
$ cat /sys/module/
Y
it can be Y or N. make sure libvirt agrees with the current setting:
$ virsh domcapabilities | grep vmx
<feature policy='require' name='vmx'/>
if 'nested' is Y, domcapabilities should include a vmx feature line; if 'nested' is N, it should have no output (i.e. vmx not supported in guests).
Then, change the kernel nested setting, and re-check domcapabilities. Restarting libvirtd doesn't update the cache, and even rebooting the entire system doesn't update the cache.
$ virsh domcapabilities | grep vmx
$ cat /sys/module/
N
$ sudo rmmod kvm_intel
$ sudo modprobe kvm_intel nested=1
$ cat /sys/module/
Y
$ virsh domcapabilities | grep vmx
$ sudo systemctl restart libvirtd
$ virsh domcapabilities | grep vmx
$
Not only should it work, but further configurung libvirt debug [1] the fix should leave a message like this when triggering:
VIR_DEBUG(
"value changed from %d",)
Test #2:
- restart libvirtd
- call `virsh domcapabilities`
- repeat the above
- this should later on use the cache (faster)
- If it always regenerates the cache (see spawned qemu's and new file
dates) the detection is wrong
Test #3:
- some arches (e.g. s390x) don't have this attribute, check on one of those how their behavior changes.
[regression potential]
This will make libvirt refresh the capability cache more often. This is a quite expensive tasks (depending on the number of qemu's installed which can be anything from none to all arch emulators and the kvm based ones ~10. Those will be forked and probed again. The new code now adds a rather safe detection as the nested attribute would usually only change on a reboot or a module reload. So it should be rather safe. The one real regression would be if the detection would be wrong and always trigger.
I added Test #2 above to check for that.
[other info]
related RH bugs, though no changes appear to have resulted from either:
https:/
https:/
description: | updated |
summary: |
- libvirt caches nested vmx capability (in domcapabilities) + Use more triggers to refresh libvirt capability cache |
Changed in libvirt (Ubuntu Xenial): | |
status: | New → Triaged |
description: | updated |
Hi, kvm_intel/ parameters/ nested kvm_intel/ parameters/ nested
it didn't let me calm down that we had the qemu QMP reporting VMX as off.
I mean I have heard (there never was an Ubuntu bug, but people on IRC have run into it) of issues that people needed to regen capabilities.
After checking the logs of yesterday I found the trap that you laid for me :-)
When checking QMP we were on a system that really had
$ cat /sys/module/
N
So the bit in the cpuid was off.
But when we made the cross check with the cpuid tool we were on a different system, probably with
$ cat /sys/module/
Y
But the caps cache not regenerated. Hence there the tool reported the bit to be set.
That red herring put aside I can let go my confusion and focus on what is ahead, as discussed on IRC we might want to look into :
a) verifying the triggers for a reload today properly working (e.g. new qemu binary)
b) consider adding more triggers (maybe: libvirtd restart, module load times, surely: reboot)
Not sure if that will be today or next week. libvirt/ qemu/capabiliti es/ gives you a workaround until then.
But with manually cleaning the /var/cache/