havana on CentOS instance file injection problem

Bug #1246852 reported by Joe Breu
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Medium
Sahid Orentino

Bug Description

Using havana with CentOS packages from EPEL

On an instance launch, there is a qemu-kvm process that lingers after the instance is launched (in addition to the regular qemu process that the instance is running under). This appears to be caused by file injection.

2013-10-31 19:07:43.542 5275 DEBUG nova.virt.disk.api [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Checking if we can resize image /var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk. size=1073741824 can_resize_image /usr/lib/python2.6/site-packages/nova/virt/disk/api.py:157
2013-10-31 19:07:43.659 5275 DEBUG nova.virt.disk.api [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Checking if we can resize filesystem inside /var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk. CoW=True is_image_partitionless /usr/lib/python2.6/site-packages/nova/virt/disk/api.py:171
2013-10-31 19:07:43.661 5275 DEBUG nova.virt.disk.vfs.api [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Instance for image imgfile=/var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk imgfmt=qcow2 partition=None instance_for_image /usr/lib/python2.6/site-packages/nova/virt/disk/vfs/api.py:31
2013-10-31 19:07:43.662 5275 DEBUG nova.virt.disk.vfs.api [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Trying to import guestfs instance_for_image /usr/lib/python2.6/site-packages/nova/virt/disk/vfs/api.py:34
2013-10-31 19:07:43.680 5275 DEBUG nova.virt.disk.vfs.api [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Using primary VFSGuestFS instance_for_image /usr/lib/python2.6/site-packages/nova/virt/disk/vfs/api.py:41
2013-10-31 19:07:43.685 5275 DEBUG nova.virt.disk.vfs.guestfs [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Setting up appliance for /var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk qcow2 setup /usr/lib/python2.6/site-packages/nova/virt/disk/vfs/guestfs.py:111
2013-10-31 19:08:30.379 5275 DEBUG nova.virt.disk.vfs.guestfs [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Mount guest OS image /var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk partition None setup_os_static /usr/lib/python2.6/site-packages/nova/virt/disk/vfs/guestfs.py:57
2013-10-31 19:08:30.644 5275 DEBUG nova.virt.disk.api [req-f2aee8ed-e382-4487-8b75-b4d1eed76ae4 b82b8e94e4b64a2bb8b21eef98b1f5e8 51c36ca36a37475da1d1a2662576688f] Unable to mount image /var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk with error Error mounting /var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk with libguestfs (mount_options: /dev/vda on / (options: ''): mount: you must specify the filesystem type). Cannot resize. is_image_partitionless /usr/lib/python2.6/site-packages/nova/virt/disk/api.py:183

Process list:
nova 5403 30.6 6.8 1009036 271024 ? S 19:07 1:44 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -drive file=/var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk,cache=none,format=qcow2,if=virtio -nodefconfig -machine accel=kvm:tcg -m 500 -no-reboot -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsQqFC2I/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -kernel /var/tmp/.guestfs-162/kernel.5275 -initrd /var/tmp/.guestfs-162/initrd.5275 -append panic=1 console=ttyS0 udevtimeout=300 no_timer_check acpi=off printk.time=1 cgroup_disable=memory selinux=0 TERM=screen -drive file=/var/tmp/.guestfs-162/root.5275,snapshot=on,if=virtio,cache=unsafe

qemu 5736 31.4 7.3 2056408 292336 ? S 19:08 1:31 /usr/bin/qemu-system-x86_64 -name instance-00000025 -S -M rhel6.4.0 -cpu Opteron_G3,+nodeid_msr,+wdt,+skinit,+ibs,+osvw,+3dnowprefetch,+cr8legacy,+extapic,+cmp_legacy,+3dnow,+3dnowext,+pdpe1gb,+fxsr_opt,+mmxext,+ht,+vme -no-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -uuid 5f69468b-544c-4162-88ce-dfcc3a3e7568 -smbios type=1,manufacturer=Red Hat Inc.,product=OpenStack Nova,version=2013.2-2.el6,serial=cef1de3d-e616-b2d2-95d8-5c8f36a149f1,uuid=5f69468b-544c-4162-88ce-dfcc3a3e7568 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000025.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=fa:16:3e:1e:6d:10,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/5f69468b-544c-4162-88ce-dfcc3a3e7568/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

[root@breu-mini-centos-compute2 ~]# rpm -qa | grep guestfs
libguestfs-1.16.34-2.el6.x86_64
libguestfs-tools-1.16.34-2.el6.x86_64
python-libguestfs-1.16.34-2.el6.x86_64
libguestfs-tools-c-1.16.34-2.el6.x86_64

We have set the following in nova.conf in an attempt to disable file injection altogether:
libvirt_inject_key=false
libvirt_inject_password=false
libvirt_inject_partition=-1
injected_network_template=""

However instance creation still appears to be using guestfs and failing. The qemu-kvm process lingers after the instance is destroyed and is repeatable.

Tags: libvirt resize
Revision history for this message
Matt Riedemann (mriedem) wrote :

Bug 1241659 looks related.

Revision history for this message
Matt Riedemann (mriedem) wrote :

What do you have set for use_cow_images in nova.conf?

What's the metadata on the image?

Revision history for this message
Yukinori Sagara (sagaray) wrote :

We also got the same issue during stress test.
Our configuration is CentOS 6.4(x86_64) and RDO havana (openstack-nova-compute-2013.2-2.el6.noarch).

This issue comes from insufficient error handling of nova regarding VM image setup using libguestfs.

By some reason, qemu processes created by libguestfs above stuck, and this is a kind of memory leak. That's why eventually we will see qemu process invocation failures because of insufficient memory.

We need to patch two files in order to resolve the above issue like the following:

https://github.com/openstack/nova/blob/master/nova/virt/disk/api.py
https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py

 * nova/virt/disk/api.py
   1. is_image_partitionless() (filesystem resize use it)
     1-1. move "fs = vfs.VFS.instance_for_image(image, 'qcow2', None)"
          to top of 'try' statement.
          (instance_for_image() is only used dynamic importing
          (VFS implementing class), so it's OK.)
     1-2. add 'finally' clause to 'try' statement, and move
          'fs.teardown()' to in it.

   2. inject_data()
     2-1. same as 1-1, 1-2.

 * nova/virt/disk/vfs/guestfs.py
   3. setup()
     3-1. remove 'self.handle = None' in exception handling for
          releasing resource in teardown().
   4. teardown()
     4-1. before releasing resource, check 'self.handle is not None'.
          if self.handle is None, return right now.

Revision history for this message
evanjfraser (evanjfraser) wrote :

Hello,
I'm also hitting the same problem with Fedora 19 + RDO.

Thanks to Yukinori Sagara for providing the work around, I've attached a patch based on his directions if that is helpful to anyone.

It appears to do the trick, if someone would like to check it however I'd be very grateful.

Regards, Evan.

Revision history for this message
Yukinori Sagara (sagaray) wrote :

Thanks evanjfraser, for submitting patche.
I checked the patch, and found some point need to be fixed.

 * nova/virt/disk/api.py
   1. is_image_partitionless()
      The patch moves 'return False' into 'finally' clause.
      i think we must call 'return True' at bottom of method, if guestfs's call exits successfully.
      'return False' should be called at the bottom of 'except' clause,

   2. inject_data()
      ** addition **
      I forgot to write one thing in the last post, sorry.

      We should call 'fs.teardown()' everytime after 'fs.setup()' called,

      We need to merge two try statement, and move 'return inject_data_into_fs()' to the bottom of try clause.
      (move to the next line of 'fs.setup()')

 * nova/virt/disk/vfs/guestfs.py
   3. setup()
      'self.handle = None' remains in 'except Exception' clause.
      it makes us disable to release self.handle's resources in teardown().

   4. teardown()
      Patch is almost OK, but I think patch's 'self.handle is None' check is in too deep.
      It's better the next of 'LOG.debug()' or next of first 'try:'.
      (second 'try:' is too deep for None check.)

Revision history for this message
evanjfraser (evanjfraser) wrote :

Hi Yukinori,

Re: 2. inject_data()

Do you mean to put the fs.teardown() in the try like:

    try:
        fs.setup()
        fs.teardown()
        return inject_data_into_fs(fs, key, net, metadata,

Or to leave the fs.teardown() in the finally clause?

Regards, Evan.

Revision history for this message
evanjfraser (evanjfraser) wrote :

Actually never mind, the finally clause would never get reached. It will either return from the Try or the Except clauses.

Revision history for this message
evanjfraser (evanjfraser) wrote :

Hello again, I've submitted a revised patch. Many thanks,

Evan.

Revision history for this message
Yukinori Sagara (sagaray) wrote :

Hi evanjfraser, thanks to reply repeatedly.

> Re: 2. inject_data()

> Or to leave the fs.teardown() in the finally clause?

It is correct.
Everytime fs.teardown() should be called, whether exception
occurs or not.

> Actually never mind, the finally clause would never get reached. It will either
> return from the Try or the Except clauses.

Just to confirm, 'finally clause' is called in anytime when enter in
try statement once, if no exception occurs, even if 'return' is called
in try clause.

The below is an example;

def is_image_partitionless():
    ...
    fs = vfs.VFS.instance_for_image() ...(1)
    try:
        fs.setup() ...(2)
    except:
        ...
        return False ...(3)
    finally:
        fs.teardown() ...(4)

    # other check (try - except - finally)

    return True ...(5)

* is_image_partitionless() successfully exit pattern

  (1) -> (2) -> (4) -> (5)

* is_image_partitionless() abnormally exit pattern

  (1) -> (2) -> (3) -> (4)

> file_injection_v2.patch

Thanks patch revised.

* line 33, indent is too deep.
* line 36, 50-52 wrong. 'fs.teardown()' should called in finally clause.

I think above is right. Please check it.

def inject_data(...)
    ...
    if use_cow:
        fmt = "qcow2"
    fs = vfs.VFS.instance_for_image...
    try:
        os.stat(image) # added in current master
        fs.setup()
        return inject_data_into_fs(fs, ...) # 'fs' using here
    except Exception as e:
        ...
        return False
    finally:
        fs.teardown()

# try statement in inject_data() should not split for catch fs
# exception surely in finally clause.

Revision history for this message
Yukinori Sagara (sagaray) wrote :

Sorry , I missed.

> I think above is right. Please check it.
I think below is right. Please check it.

Joe Breu (breu)
Changed in nova:
status: New → Confirmed
assignee: nobody → Joseph W. Breu (breu)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/59853

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Joe Breu (breu) wrote :

I have verified that the patch works and am submitting upstream

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/59854

Revision history for this message
Yukinori Sagara (sagaray) wrote :

Thanks rackerjoe.
I checked both gerrit, but both patch isn't sufficient.

* api.py
    Daniel Berrange also pointed, fs.setup()'s exception is not handled.
    * https://review.openstack.org/#/c/59853/1/nova/virt/disk/api.py
      newer version, line 338
    * https://review.openstack.org/#/c/59854/1/nova/virt/disk/api.py
      newer version, line 326

    To move fs.teardown() into finally block,
    'fs = vfs.VFS.instance_for_image' needs to move outside of try block.

    This is an example.
    --------------------------------
    fs = vfs.VFS.instance_for_image(...)
    try:
        os.stat(image)
        fs.setup()
        return inject_data_into_fs(...)
    except Exception as e:
        for inject in ....
    ...
    return False
    finally:
        fs.teardown()
    --------------------------------

* guestfs.py
    guestfs.py remains old comment.

    patch removes 'self.handle = None', so
    '# dereference object and implicitly close()' comment also
    should be removed.

    * https://review.openstack.org/#/c/59853/1/nova/virt/disk/vfs/guestfs.py
      newer version, line 121
    * https://review.openstack.org/#/c/59854/1/nova/virt/disk/vfs/guestfs.py
      newer version, line 125

Revision history for this message
Yukinori Sagara (sagaray) wrote :

Example indent was broken. Below is correct.

    --------------------------------
    fs = vfs.VFS.instance_for_image(...)
    try:
        os.stat(image)
        fs.setup()
        return inject_data_into_fs(...)
    except Exception as e:
        for inject in ....
        ...
        return False
    finally:
        fs.teardown()
    --------------------------------

Revision history for this message
stevenguo (steven-fl2000) wrote :

I downloaded the files from gerrit, but found the patch isn't sufficient.

If the swap disk setting in the flavor is not zero, openstack will fail to start the instance.
If the swap disk setting in the flavor is zero, everything will be fine and the instance can be started successfully.

So it seemed that we forgot to consider the injected swap disk setting.

Revision history for this message
stevenguo (steven-fl2000) wrote :

Found another problem related with the resize: The resize function work, but the new added disk space shown as a free space and didn't merge into the existing root file system as before.

I started an instance with the flavor of "8Cpu, 8G memory,100G disk", after the instance started, I double checked the disk use the disk utility tool inside the instance and found there are 2 disk device: /dev/vda1 is 20G, /dev/vda (new added space) shown as a 80G free space.

Compare with the instance from another openstack cloud (old havana version installed 2 month before), the resized (new added) 80G space will be merged into the /dev/vda1, not a new free space and a new device.

I will try to use the old api.py and guestfs.py files from the old havana system to see whether it can works well as before.

Revision history for this message
Yukinori Sagara (sagaray) wrote :

Hello stevenguo.

I think, you have encountered another problem.

The topic of this bug report is Qemu process leak problem.
Instance boot is successful but if you continue to make a overload over a period of time, Qemu process leak occur.

Your problem is, instance boot (with swap disk) always failure.
The part of code I patched, swap disk associated routine was not included.
So I think your booting problem may be another one.

About resizing.

Resize behavior seems to be different, whether image file has a partition table, or not.
You could check your image file?

Revision history for this message
stevenguo (steven-fl2000) wrote :

Thank you very much, Yukinori!

Revision history for this message
dubi (dubi-il) wrote :

Has the above been fixed in Icehouse ? because I get the following error when launching an Ubuntu 12.4 image or fedora 19 (~ 2G size):

2014-06-03 12:57:30.287 11842 DEBUG nova.virt.disk.api [req-6eae3328-28b8-41a4-957b-c3ffa22f8931 a4291f4775864bb982a1da28db202f96 fb7478566b4544578bc7ae15e7443b65] Unable to mount image /var/lib/nova/instances/58ffb7c4-0db2-4926-8004-54c2c5f6ea55/disk with error Error mounting /var/lib/nova/instances/58ffb7c4-0db2-4926-8004-54c2c5f6ea55/disk with libguestfs (mount_options: /dev/sda on / (options: ''): mount: you must specify the filesystem type). Cannot resize. is_image_partitionless /usr/lib/python2.6/site-packages/nova/virt/disk/api.py:211

The values in nova.conf are :

libvirt_inject_key=true
libvirt_inject_password=false
libvirt_inject_partition=-1 (or -2)
injected_network_template==$pybasedir/nova/virt/interfaces.template

The launch is stuck at 'spawn' state

The Ubuntu image has a swap partition

Such a launch worked for me with Havana stable /RDO !

On the other hand when I launch a cirros3.0 image (13M) I do not get the above error and the instance becomes Active
The cirros image has only one entry in the partition table (fdisk -l) -no swap

If a patch is needed where to take it from ?

Revision history for this message
dubi (dubi-il) wrote :

it seems that the above error message should be a Warning (or better INFO) since it is printed in the function: is_image_partitionless() and indicates that the image has more than 1 partition and therefore returns False . The exception that writes this error message is thrown by a code called from libguestfs and should not be treated as error !

Anyhow this leaves the problem of having a launch of partitioned image being stuck at nova 'spawn' state -yet unsolved
(at least in all-in-one deployment of Icehouse)

Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Triaged
assignee: rackerjoe (breu) → nobody
importance: Undecided → Medium
tags: added: libvirt resize
Lawrance (jing)
Changed in nova:
status: Triaged → Confirmed
Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/118677

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

patch is WIP

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Sean Dague (sdague) wrote :

havana is now closed

Changed in nova:
status: Confirmed → Won't Fix
Revision history for this message
Imtiaz Chowdhury (chowdhury-imtiaz) wrote :

From the comment thread, which seems to refer to more than one issues, it is unclear whether the defect was ever resolved in Havana. I tried file injection with CentOS 6.5 image on Havana and it didn't work. It works fine with the cirros image.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/118677
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.