OpenStack Compute (nova)

error in guestfs driver setup causes orphaned libguestfs processes

Bug #1270304 reported by Solly Ross on 2014-01-17

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Undecided	Solly Ross	OpenStack Compute (nova) 2014.1 "icehouse"
	Havana	Fix Released	Undecided	Xing Yang	OpenStack Compute (nova) 2013.2.3

Bug Description

In the libguestfs driver for Nova VFS, certain errors in the setup method can cause a libguestfs process to remain running, even after the VM is terminated and/or the method that spawned the libguestfs VM has finished. These processes become zombies when killed, and can only be destroyed by restarting nova-compute.

In the particular issue encountered, when using gluster as a cinder backend and attempting to launch a VM before adding a keypair, the error would occur. However, it is feasible that the error could occur elsewhere.

Tags:

Solly Ross (sross-7) on 2014-01-17

Changed in nova:
assignee:	nobody → Solly Ross (sross-7)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-17: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/67586

Changed in nova:
status:	New → In Progress

Revision history for this message

David Wittman (david-wittman) wrote on 2014-01-29:

Here are the qemu-kvm zombie processes which I'm seeing as a result of this same issue:

/usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.16420 -initrd /var/tmp/.guestfs-162/initrd.16420 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/c1c6ab55-b09a-498e-b007-47195911b084/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.16420,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfs9AiaSX/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm

Solly, can you restore your abandoned changes in Gerrit? I don't know why that gate failed but your patch LGTM and it fixed my issue.

Revision history for this message

Solly Ross (sross-7) wrote on 2014-02-11:

@david-wittman: the change has been restored (it was a common gate bug). Hopefully I someone can review it soon.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-04: Fix merged to nova (master)

Reviewed: https://review.openstack.org/67586
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fa62cfe15fd6dddecf657966d166a2d3a2bdd317
Submitter: Jenkins
Branch: master

commit fa62cfe15fd6dddecf657966d166a2d3a2bdd317
Author: Solly Ross <email address hidden>
Date: Fri Jan 17 16:43:48 2014 -0500

Explicity teardown on error in libguestfs setup()

    Previously, in the setup method of the libguestfs driver
    for Nova's VFS, on an error the handle object was simply
    dereferenced (set to None), and then an new error was thrown.
    This relied on an implicit close() on the handle being called
    by GC. However, in some cases the setup progresses far enough
    that the implicit close is not enough, and leaves an "orphaned"
    libguestfs VM.

Now, the teardown() method is called explicitly, which ensures that
no "orphaned" VMs are left around in case of an error.

Change-Id: I7fbe470046ce6c76bc13d77d8df360351a3ef715
Fixes: bug #1270304

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-04: Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/77914

Thierry Carrez (ttx) on 2014-03-05

Changed in nova:
milestone:	none → icehouse-3
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-19: Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/77914
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=06dd47f3dcdd33da62a41e6c8a238257ff38ea62
Submitter: Jenkins
Branch: stable/havana

commit 06dd47f3dcdd33da62a41e6c8a238257ff38ea62
Author: Solly Ross <email address hidden>
Date: Fri Jan 17 16:43:48 2014 -0500

Explicity teardown on error in libguestfs setup()

Now, the teardown() method is called explicitly, which ensures that
no "orphaned" VMs are left around in case of an error.

    Change-Id: I7fbe470046ce6c76bc13d77d8df360351a3ef715
    Fixes: bug #1270304
    (cherry picked from commit fa62cfe15fd6dddecf657966d166a2d3a2bdd317)

tags:

added: in-stable-havana

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-3 → 2014.1

Matt Riedemann (mriedem) on 2014-04-23

tags:

added: libguestfs libvirt

Revision history for this message

Qin Zhao (zhaoqin) wrote on 2014-05-19:

I can still see qemu-kvm process not exit using stable icehouse code, which also makes nova process hang. Can anybody have a look at https://bugs.launchpad.net/nova/+bug/1313477 ?

Is there any way to diagnose why qemu-kvm process fork by libguestfs does not exit?

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.