Libvirt does not follow RESUME qemu monitor events. VMs remain in "paused" state forever
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libvirt (Ubuntu) |
Fix Released
|
High
|
Serge Hallyn | ||
Precise |
Fix Released
|
High
|
Serge Hallyn | ||
Quantal |
Won't Fix
|
High
|
Unassigned |
Bug Description
=======
SRU Justification:
1. Impact: if a Vm is paused over the monitor, and then resumed, libvirt will continue to report the running VM as paused.
2. Development fix: add a hook to follow the resume event
3. Stable fix: same as development fix
4. Test case: see below
5. Regression potential: an error in the hook could cause the above situation to cause a crash instead of libvirt following the VM resume. All regression tests passed with this fix.
=======
If a qemu/KVM VM is paused through a monitor by manual issuing of the "stop" command, the state of the VM in libvirtd's view will transition to "paused". This is because libvirtd listens to "STOP" events in the JSON monitor. However, libvirt does not listen to RESUME events on any monitor. So, when the VM is resumed by manually issuing "cont", the internal state will remain as "paused" even though the VM is running.
Libvirt maintains its internal view of the state in sync for migration, etc. But without listening to RESUME events it cannot correctly cope with third parties issuing stop commands (such as GDB, virsh qemu-monitor-
This is verified to happen on Precise and Quantal's libvirt versions. Since it's a bug in upstream, I expect it to be faulty in Raring as well.
The upshot in Openstack is that VMs, even though running, will be reported as paused to nova. Due to (https:/
Steps to Reproduce:
# virsh list
Id Name State
-------
1 instance-00000020 running
# virsh qemu-monitor-
{"return"
# virsh list
Id Name State
-------
1 instance-00000020 paused
# virsh qemu-monitor-
{"return"
# virsh list
Id Name State
-------
1 instance-00000020 paused
(the state should be "running")
Another way to reproduce this is by if attaching GDB to qemu and start single-stepping, libvirt will drop dozens RESUME events and be mightily confused.
Client software like OpenStack will tag the VM as paused.
Upstream:
Reported to libvirt upstream: https:/
Fixed in libvirt's master git: http://
I will attach a backport of the master branch fix to 0.9.13-
Changed in libvirt (Ubuntu): | |
assignee: | nobody → Serge Hallyn (serge-hallyn) |
status: | Triaged → In Progress |
Changed in libvirt (Ubuntu Precise): | |
assignee: | nobody → Serge Hallyn (serge-hallyn) |
status: | Triaged → In Progress |
description: | updated |
With the above patch: ------- ------- ------- ------- ------- ------- ---
# virsh list
Id Name State
-------
1 instance-00000022 running
# virsh qemu-monitor- command 1 '{"execute" :"stop" }' :{},"id" :"libvirt- 12"}
{"return"
# virsh list ------- ------- ------- ------- ------- ------- ---
Id Name State
-------
1 instance-00000022 paused
# virsh qemu-monitor- command 1 '{"execute" :"cont" }' :{},"id" :"libvirt- 13"}
{"return"
# virsh list ------- ------- ------- ------- ------- ------- ---
Id Name State
-------
1 instance-00000022 running