error in guestfs driver setup causes orphaned libguestfs processes

Bug #1270304 reported by Solly Ross
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Solly Ross
Havana
Fix Released
Undecided
Xing Yang

Bug Description

In the libguestfs driver for Nova VFS, certain errors in the setup method can cause a libguestfs process to remain running, even after the VM is terminated and/or the method that spawned the libguestfs VM has finished. These processes become zombies when killed, and can only be destroyed by restarting nova-compute.

In the particular issue encountered, when using gluster as a cinder backend and attempting to launch a VM before adding a keypair, the error would occur. However, it is feasible that the error could occur elsewhere.

Solly Ross (sross-7)
Changed in nova:
assignee: nobody → Solly Ross (sross-7)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/67586

Changed in nova:
status: New → In Progress
Revision history for this message
David Wittman (david-wittman) wrote :

Here are the qemu-kvm zombie processes which I'm seeing as a result of this same issue:

/usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.16420 -initrd /var/tmp/.guestfs-162/initrd.16420 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/c1c6ab55-b09a-498e-b007-47195911b084/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.16420,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfs9AiaSX/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=xterm

Solly, can you restore your abandoned changes in Gerrit? I don't know why that gate failed but your patch LGTM and it fixed my issue.

Revision history for this message
Solly Ross (sross-7) wrote :

@david-wittman: the change has been restored (it was a common gate bug). Hopefully I someone can review it soon.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/67586
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fa62cfe15fd6dddecf657966d166a2d3a2bdd317
Submitter: Jenkins
Branch: master

commit fa62cfe15fd6dddecf657966d166a2d3a2bdd317
Author: Solly Ross <email address hidden>
Date: Fri Jan 17 16:43:48 2014 -0500

    Explicity teardown on error in libguestfs setup()

    Previously, in the setup method of the libguestfs driver
    for Nova's VFS, on an error the handle object was simply
    dereferenced (set to None), and then an new error was thrown.
    This relied on an implicit close() on the handle being called
    by GC. However, in some cases the setup progresses far enough
    that the implicit close is not enough, and leaves an "orphaned"
    libguestfs VM.

    Now, the teardown() method is called explicitly, which ensures that
    no "orphaned" VMs are left around in case of an error.

    Change-Id: I7fbe470046ce6c76bc13d77d8df360351a3ef715
    Fixes: bug #1270304

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/77914

Thierry Carrez (ttx)
Changed in nova:
milestone: none → icehouse-3
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/77914
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=06dd47f3dcdd33da62a41e6c8a238257ff38ea62
Submitter: Jenkins
Branch: stable/havana

commit 06dd47f3dcdd33da62a41e6c8a238257ff38ea62
Author: Solly Ross <email address hidden>
Date: Fri Jan 17 16:43:48 2014 -0500

    Explicity teardown on error in libguestfs setup()

    Previously, in the setup method of the libguestfs driver
    for Nova's VFS, on an error the handle object was simply
    dereferenced (set to None), and then an new error was thrown.
    This relied on an implicit close() on the handle being called
    by GC. However, in some cases the setup progresses far enough
    that the implicit close is not enough, and leaves an "orphaned"
    libguestfs VM.

    Now, the teardown() method is called explicitly, which ensures that
    no "orphaned" VMs are left around in case of an error.

    Change-Id: I7fbe470046ce6c76bc13d77d8df360351a3ef715
    Fixes: bug #1270304
    (cherry picked from commit fa62cfe15fd6dddecf657966d166a2d3a2bdd317)

tags: added: in-stable-havana
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
Matt Riedemann (mriedem)
tags: added: libguestfs libvirt
Revision history for this message
Qin Zhao (zhaoqin) wrote :

I can still see qemu-kvm process not exit using stable icehouse code, which also makes nova process hang. Can anybody have a look at https://bugs.launchpad.net/nova/+bug/1313477 ?

Is there any way to diagnose why qemu-kvm process fork by libguestfs does not exit?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.