virsh destroy might also kill another running VM

Bug #362288 reported by Thierry Carrez
56
This bug affects 5 people
Affects Status Importance Assigned to Milestone
libvirt (Fedora)
Fix Released
High
libvirt (Ubuntu)
Fix Released
Critical
Unassigned
Jaunty
Fix Released
Critical
Dustin Kirkland 

Bug Description

Jaunty KVM amd64 host running:
  libvirt 0.6.1-0ubuntu4
  kvm 1:84+dfsg-0ubuntu10

I'm starting multiple VMs (using virsh start or virt-manager)
Then calling "virsh destroy" on the first one in the list also destroys the second one:

ttx@cassini:~$ virsh list
Connecting to uri: qemu:///system
 Id Name State
----------------------------------
  5 intrepid-test running
  6 jaunty-devel running

ttx@cassini:~$ virsh destroy intrepid-test
Connecting to uri: qemu:///system
Domain intrepid-test destroyed

ttx@cassini:~$ virsh list
Connecting to uri: qemu:///system
 Id Name State
----------------------------------

ttx@cassini:~$

Running strace on the-one-that-shouldn't-be-killed reveals it receives SIGKILL.

Test scenarios:
Starting A, Starting B, virsh destroy A -> A and B get destroyed
Starting B, Starting A, virsh destroy B -> A and B get destroyed
Starting A, Starting B, virsh destroy B -> only B is destroyed
Starting A, Starting B, Starting C, virsh destroy A -> A and B get destroyed
Starting A, Starting B, Starting C, virsh destroy B -> B and C get destroyed

I can reproduce it on freshly created VMs stuck at "select boot device" BIOS stage (I hit F12) so it probably doesn't depend on the guest nature.

Using "Force off" from virt-manager also triggers the issue.

Revision history for this message
Andrew Bolster (bolster) wrote :

Possibly related: when attempting to reboot vm's from the Virtual Machine Manager windows causes this complaint:

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/engine.py", line 528, in reboot_domain
    vm.reboot()
  File "/usr/share/virt-manager/virtManager/domain.py", line 504, in reboot
    self.vm.reboot(0)
  File "/usr/lib/python2.6/dist-packages/libvirt.py", line 392, in reboot
    if ret == -1: raise libvirtError ('virDomainReboot() failed', dom=self)
libvirtError: this function is not supported by the hypervisor: virDomainReboot

Revision history for this message
Andrew Bolster (bolster) wrote :

Forgot to mention, same arch, same version of kvm and libvirt 0.6.1-0ubuntu4, tested scenarios listed above with and without guest os's installed, all confirmed

Revision history for this message
Thierry Carrez (ttx) wrote :

Bolster: yes, I also experience that, however this is a different bug. Could you please open a separate bug about it ? Thanks !

Revision history for this message
Pekka Jääskeläinen (pekka-jaaskelainen) wrote :

I'm also experiencing this. This issue combined with a strange randomish guest freezing problem (with a Debian Lenny guest) makes KVM unusable in the Jaunty host for me.

Thierry Carrez (ttx)
Changed in libvirt (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
In , Will (will-redhat-bugs) wrote :

Sometimes, using 'virsh destroy' or the 'Force off' button in virt-manager will cause multiple running VMs to be destroyed. Here's an example:

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name State
----------------------------------
  1 Ubuntu_Jaunty running
  4 Fedora_10_clone running
  5 F10_2 running
  6 F10 running

[wwoods@metroid ~]$ sudo virsh destroy F10_2
Domain F10_2 destroyed

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name State
----------------------------------
  1 Ubuntu_Jaunty running
  4 Fedora_10_clone running
  - F10 shut off
  - F10_2 shut off

Note that 'F10' is now also shut off, even though I didn't destroy it. This doesn't seem to happen every time, and it doesn't seem to be related to the names of the hosts being similar:

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name State
----------------------------------
  1 Ubuntu_Jaunty running
  4 Fedora_10_clone running
  8 F10_2 running
  - F10 shut off

[wwoods@metroid ~]$ sudo virsh start F10
Domain F10 started

[wwoods@metroid ~]$ sudo virsh destroy Fedora_10_clone
Domain Fedora_10_clone destroyed

[wwoods@metroid ~]$ sudo virsh list --all
 Id Name State
----------------------------------
  1 Ubuntu_Jaunty running
 10 F10 running
  - F10_2 shut off
  - Fedora_10_clone shut off

There are no relevant messages in syslog, other than the expected ones (e.g. "kernel: virbr0: port 3(vnet2) entering disabled state" as the host comes down).

Revision history for this message
In , Daniel (daniel-redhat-bugs) wrote :

Can you run 'strace -f -p $PID-OF-LIBVIRTD -s 1000 -ff -o' and then try and reproduce the destroy problem.

Also, can you turn on full debug logging of libvirtd & capture the results http://libvirt.org/logging.html

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Nasty. I've asked wwoods for a libvirtd log ala:

https://fedoraproject.org/wiki/Reporting_virtualization_bugs#libvirt

Revision history for this message
In , Will (will-redhat-bugs) wrote :

Created attachment 342908
very verbose log from libvirtd

This is the full log from libvirtd with log_level set to 1. It follows these basic steps:

service libvirtd restart
virsh list --all
 Id Name State
----------------------------------
  1 Ubuntu_Jaunty running
 13 F10 running
  - F10_2 shut off
  - F10_RAID shut off
  - F9 shut off
  - Fedora_10_clone shut off
  - Rawhide shut off

virsh start F10_2
virsh start Fedora_10_clone
virsh list --all
virsh destroy F10_2
virsh list --all
virsh start F10_2
virsh destroy F10
virsh list --all
virsh start F10
virsh list --all
virsh list --all
virsh start F10_2
virsh list --all
# problem is triggered here - F10 dies as well
virsh destroy F10_2
virsh list --all
 Id Name State
----------------------------------
  1 Ubuntu_Jaunty running
 15 Fedora_10_clone running
  - F10 shut off
  - F10_2 shut off
  - F10_RAID shut off
  - F9 shut off
  - Rawhide shut off

service libvirtd stop

Hope you can make some sense of it - it's 4MB uncompressed.

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

Interesting, you only issued a destroy for F10_2, but yet:

14:12:43.927: debug : virDomainDestroy:1750 : domain=0x7fb270001270
14:12:43.927: debug : qemudShutdownVMDaemon:1518 : Shutting down VM 'F10_2'

14:12:43.927: debug : virEventRemoveHandleImpl:165 : Remove handle 20
14:12:43.927: debug : virEventRemoveHandleImpl:172 : mark delete 11 24
14:12:43.983: debug : virEventRunOnce:544 : Poll got 1 event
14:12:43.983: debug : virEventDispatchHandles:416 : Skip deleted 24
14:12:43.983: debug : virEventDispatchHandles:425 : Dispatch 24 32 0x13f7c00
14:12:44.036: info : Setting SELinux context on '/var/lib/libvirt/images/F10_2.img' to 'system_u:object_r:virt_image_t:s0'
14:12:44.036: debug : virEventUpdateTimeoutImpl:233 : Updating timer 0 timeout with 0 ms freq
14:12:44.036: debug : qemudShutdownVMDaemon:1518 : Shutting down VM 'F10'

Revision history for this message
In , Daniel (daniel-redhat-bugs) wrote :

This is an event loop dispatcher bug. The destroy call is killing the QEMU process, so we then get a HANGUP event on the FD associated with the guest monitor. The callback for this is already marked as deleted in the event loop though, so it gets skiped, and then we mistakenly dispatch the next callback in the loop, causing us to think another VM has died, and trigger cleanup of that guest

Revision history for this message
In , Daniel (daniel-redhat-bugs) wrote :

*** Bug 499788 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Daniel (daniel-redhat-bugs) wrote :

Created attachment 343109
Fix event loop handling of deletes & test functionality

THis patch fixes the event loop handling of deletes and adds a test case which validates the various important scenarios actually work

Revision history for this message
In , Daniel (daniel-redhat-bugs) wrote :

*** Bug 500089 has been marked as a duplicate of this bug. ***

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I too see this. Attached is the XML definitions for two affected machines.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Interestingly, as Thierry pointed out, if I run 5 VMs sec-(dapper|hardy|intrepid|jaunty|karmic)-i386, and start them in the order I listed, and do 'virsh destroy sec-intrepid-i386', then only sec-jaunty-i386 gets additionally destroyed. dapper, hardy and karmic all stay up. I tried this several times and the two machines that are destroyed are always the one specified and the one started immediately after it.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-9.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/libvirt-0.6.2-9.fc11

Revision history for this message
In , Mark (mark-redhat-bugs) wrote :

I've pushed libvirt-0.6.2-9.fc11 to updates-testing with Dan's fix. Please test and update the update's karma using the link above

* Thu May 21 2009 Mark McLoughlin <email address hidden> - 0.6.2-9.fc11
- Fix qemu argv detection with latest qemu (bug #501923)
- Fix XML attribute escaping (bug #499791)
- Fix serious event handling issues causing guests to be destroyed (bug #499698)

Revision history for this message
Justin Huff (jjhuff) wrote :
Revision history for this message
Woo (w-digmia) wrote :

You seriously mean that importance of this bug is medium?

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-9.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with
 su -c 'yum --enablerepo=updates-testing update libvirt'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-5311

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-10.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/libvirt-0.6.2-10.fc11

Revision history for this message
Michal Ingeli (xyzz) wrote :

Severity high, also according to upstream. This flaw makes libvirt practically unusable.

Revision history for this message
Andrew Bolster (bolster) wrote : Re: [Bug 362288] Re: virsh destroy might also kill another running VM

has anyone confirmed if the fedora patch noted above actually works?

2009/5/22 Michal Ingeli <email address hidden>

> Severity high, also according to upstream. This flaw makes libvirt
> practically unusable.
>
> --
> virsh destroy might also kill another running VM
> https://bugs.launchpad.net/bugs/362288
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Andrew Bolster (bolster) wrote :

I cannot find any feedback regarding the fedora fix noted by JH, has been put into unstabletesting but no responders. Has anyone else tried this? Will test tonight and feedback here and on bz

Changed in libvirt (Fedora):
status: Unknown → Fix Committed
Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-10.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with
 su -c 'yum --enablerepo=updates-testing update libvirt'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-5441

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-11.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with
 su -c 'yum --enablerepo=updates-testing update libvirt'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-5515

Revision history for this message
Michal Ingeli (xyzz) wrote :

I can confirm, patch worked.

Attaching ported patch from fedora, if it's not already done.

Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-11.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.

Changed in libvirt (Fedora):
status: Fix Committed → Fix Released
Revision history for this message
In , Fedora (fedora-redhat-bugs) wrote :

libvirt-0.6.2-12.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/libvirt-0.6.2-12.fc11

Revision history for this message
Radovan Pútec (radko) wrote :

this is not funny anymore. this issue should be fixed immediately.

Revision history for this message
Joshua Timberman (jtimberman) wrote :

I don't see any updates to the libvirt package. Is this package updated, or will it be soon?

Revision history for this message
Simon Morvan (simon-icilalune) wrote :

Any updates ?

Revision history for this message
David Sedeño Fernandez (david-alderia) wrote :

I'm trying to update the package with the patch, but debuild doesn't work. This patch affects other.:

 Patch event-loop-hang.diff does not apply

Revision history for this message
Oliver Siegmar (osiegmar) wrote :

Please note, that also a 'virsh save domain' destroys other domains. Any updates here?

Revision history for this message
Bryan McLellan (btm) wrote :

Michal Ingeli's <a href="http://launchpadlibrarian.net/27207697/lp-362288.patch">patch</a> did not resolve the issue for me after converted by hand to quilt (attached) and applied to the patch series in libvirt=0.6.1-0ubuntu5 on jaunty.

Destroyed all guests and libvirt
Started libvirt
Started three guests
Destroyed the first guest, which consequently destroyed the second.

Perhaps my conversion is bunk. Uploading libvirt=0.6.1-0ubuntu6~btm1 to <a href="https://launchpad.net/~btm/+archive/ppa/+sourcepub/669470/+listing-archive-extra">my ppa</a> if someone wants to take a look.

Michal, what source did you apply your patch to where the problem was resolved for you? As was <a href="https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/362288/comments/17">suggested by David</a> the patch doesn't apply cleanly to the ubuntu source for libvirt=0.6.1-0ubuntu5.

Revision history for this message
Bryan McLellan (btm) wrote :

Upstream patch [1] works against 0.6.1-0ubuntu5 on jaunty.

I'm attaching this patch converted to quilt sans the test code, it applies fine as the last in the series.

Testers should grab libvirt=0.6.1-0ubuntu6~btm2 from my ppa [2].

Destroyed all guests and libvirt
Started libvirt
Started three guests
Destroyed the first guest, no other guests were destroyed.

[1] https://bugzilla.redhat.com/attachment.cgi?id=343109
[2] https://launchpad.net/~btm/+archive/ppa

Revision history for this message
Bryan McLellan (btm) wrote :

Upstream patch [1] was checked into source control [2] in 72dc6d + 0a31be + 37ede4, which are in libvirt v0.6.4. Current version of libvirt in karmic is 0.6.4-1ubuntu2, so this bug should not be present in karmic.

[1] https://bugzilla.redhat.com/attachment.cgi?id=343109
[2] git://git.et.redhat.com/libvirt.git

Revision history for this message
Oliver Siegmar (osiegmar) wrote :

Bryan, I couldn't apply the event-handling-fix-rh499698-lp362288.patch against libvirt_0.6.1-0ubuntu5 -

$ quilt push
Applying patch event-handling-fix-rh499698-lp362288.patch
patching file libvirt-0.6.1/qemud/event.c
Hunk #12 FAILED at 429.
Hunk #13 succeeded at 542 (offset -1 lines).
Hunk #14 succeeded at 568 (offset -1 lines).
Hunk #15 succeeded at 612 (offset -1 lines).
Hunk #16 succeeded at 631 (offset -1 lines).
1 out of 16 hunks FAILED -- rejects in file libvirt-0.6.1/qemud/event.c
patching file libvirt-0.6.1/qemud/qemud.c
patching file libvirt-0.6.1/qemud/qemud.h
Patch event-handling-fix-rh499698-lp362288.patch does not apply (enforce with -f)

What am I missing?

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Excellent, thanks Bryan.

Ping me on IRC and I can walk you through the SRU process.

:-Dustin

Revision history for this message
Bryan McLellan (btm) wrote :

There are a lot of patches in this package, so order matters a bit.

apt-get source libvirt=0.6.1-0ubuntu5
wget http://launchpadlibrarian.net/29013416/event-handling-fix-rh499698-lp362288.patch
cd libvirt-0.6.1/
QUILT_PATCHES=debian/patches/ quilt push -a
QUILT_PATCHES=debian/patches/ quilt import ../event-handling-fix-rh499698-lp362288.patch
QUILT_PATCHES=debian/patches/ quilt push

Revision history for this message
Oliver Siegmar (osiegmar) wrote :

Thank's Bryan! Patch is installed and works great (tested save, restore and destroy) so far.

Revision history for this message
Bryan McLellan (btm) wrote :

Jaunty SRU Notes:

This bugs causes a second guest to be erroneously destroyed when the user destroys a guest. Depending on the function of the second guest, this can lead to unexpected data loss and service interruption and is inherently very destructive.

Because this bug has been fixed upstream in libvirt v0.6.4, which is in karmic, this bug does not apply to the current development release.

A quilt patch [1] has been prepared from the upstream patch [2] with the test code excluded for minimalistic purposes. This patch should applied last in the series in libvirt=0.6.1-0ubuntu5.

TEST CASE:

Create three guests, start them with 'virsh state GUEST' and verify that they are running with 'virsh list'
Destroy the first guest with 'virsh destroy GUEST'
Notice that two guests are in fact destroyed with 'virsh list'

This upstream patch is almost entirely sanity checks and would not likely cause regressions.

[1] http://launchpadlibrarian.net/29013416/event-handling-fix-rh499698-lp362288.patch
[2] https://bugzilla.redhat.com/attachment.cgi?id=343109

Revision history for this message
Bryan McLellan (btm) wrote :

Setting importance to critical because this is such a destructive bug. Awaiting Soren to continue the SRU process.

Changed in libvirt (Ubuntu):
importance: Medium → Critical
status: Confirmed → Triaged
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Thierry,

You originally reported this bug... Can you verify that Bryan's
proposed patch fixes the issue for you?

If so, I think we can move this SRU along.

:-Dustin

Changed in libvirt (Ubuntu Jaunty):
status: New → Triaged
importance: Undecided → Critical
Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote :

Thanks Dustin for pushing this one.
Yes, I can confirm that Bryan's patched version fixes the issue for me. I tested libvirt=0.6.1-0ubuntu6~btm2 from Bryan's PPA. I also confirm that the problem is already fixed in karmic.

Changed in libvirt (Ubuntu Jaunty):
status: Triaged → In Progress
assignee: nobody → Dustin Kirkland (kirkland)
milestone: none → jaunty-updates
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Okay, I've uploaded the package to jaunty-proposed.

Martin, could you take a look at that and accept it to -proposed at your convenience?

There seems to be a lot of people experiencing this bug. Please, by all means, help us test this package when it lands in -proposed.

:-Dustin

Changed in libvirt (Ubuntu Jaunty):
status: In Progress → Fix Committed
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted libvirt into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Revision history for this message
Bryan McLellan (btm) wrote :

libvirt-bin=0.6.1-0ubuntu5.1 from jaunty-proposed passes the SRU test case and acts as expected for me.

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Thierry Carrez (ttx) wrote :

Confirmed fixed as well in jaunty-proposed.

Revision history for this message
Cody Herriges (ody-cat) wrote :

Also confirming fixed with jaunty-proposed packages.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 0.6.1-0ubuntu5.1

---------------
libvirt (0.6.1-0ubuntu5.1) jaunty-proposed; urgency=low

  * debian/patches/event-handling-fix-rh499698-lp362288.patch: cherry pick
    from upstream; fixes critical, destructive bug whereby calling
    'virsh destroy foo' kills foo, plus another vm; LP: #362288

 -- Dustin Kirkland <email address hidden> Thu, 16 Jul 2009 18:45:28 -0500

Changed in libvirt (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Changed in libvirt (Fedora):
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.