OpenStack Compute (Nova)

libvirt disk option "cache=none" prevents VM from booting on GlusterFS/FUSE

Reported by Travis Rhoden on 2012-03-19
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Daniel Berrange

Bug Description

I just upgraded all my Ubuntu and Essex packages to the latest for Ubuntu 12.04:

root@spcnode1:/var/log/nova# dpkg -l libvirt-bin
ii libvirt-bin 0.9.8-2ubuntu13 programs for the libvirt library
root@spcnode1:/var/log/nova# dpkg -l nova-compute
ii nova-compute 2012.1~rc1~20120316.13416-0ubuntu1 OpenStack Compute - compute node

Now VMs are refusing to start, with this error:

2012-03-19 12:14:03 ERROR nova.compute.manager [req-58618bbb-ee6c-4a98-adb0-dfb8b0856ada novaadmin proj] [instance: 6f81e765-c349-4493-bc49-bea964a76ff5] Instance failed to spawn
(nova.compute.manager): TRACE: Traceback (most recent call last):
(nova.compute.manager): TRACE: File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 592, in _spawn
(nova.compute.manager): TRACE: self._legacy_nw_info(network_info), block_device_info)
(nova.compute.manager): TRACE: File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
(nova.compute.manager): TRACE: return f(*args, **kw)
(nova.compute.manager): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 936, in spawn
(nova.compute.manager): TRACE: self._create_new_domain(xml)
(nova.compute.manager): TRACE: File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 1577, in _create_new_domain
(nova.compute.manager): TRACE: domain.createWithFlags(launch_flags)
(nova.compute.manager): TRACE: File "/usr/lib/python2.7/dist-packages/libvirt.py", line 581, in createWithFlags(nova.compute.manager): TRACE: if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
(nova.compute.manager): TRACE: libvirtError: internal error process exited while connecting to monitor: char device redirected to /dev/pts/2
(nova.compute.manager): TRACE: kvm: -drive file=/mnt/vmstore/instances/instance-0000001a/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none: could not open disk image /mnt/vmstore/instances/instance-0000001a/disk: Invalid argument

Looking through libvirt logs in /var/log/libvirt and /var/log/libvirt/qemu, I was able to see what was different about the -disk line to kvm, the addition of "cache=none"

I found the addition merged in a few days ago here: https://bazaar.launchpad.net/~nova-core/nova/github/revision/2855

If I edit my libvirt.xml.template file to remove "cache=none", everything boots up happily again.

Travis Rhoden (trhoden) wrote :

After a bit more testing, this only happens when I am using GlusterFS as my VM store. In the above report, /mnt/vmstore is a GlusterFS mount (running Gluster 3.3beta). If I edit my nova.conf to remove "--instances_path", so that the default instance directofy of /var/lib/nova/instances is used instead, cache="None" seems to be accepted fine.

That begs the question of whether specifying "cache=none" across the board is really ideal.

I'm not sure what the correct setting should be for user with GlusterFS. Perhaps documenting this potential tripping point is sufficient.

Russell Bryant (russellb) wrote :

Added Daniel Berrange, who was the author of the patch that added "cache=none".

Daniel Berrange (berrange) wrote :

Hmm, this message

> could not open disk image /mnt/vmstore/instances/instance-0000001a/disk: Invalid argument

probably means that the filesystem does not support Direct IO. AFAIK, all filesystems except tmpfs should support this, so I'll investigate what the score is wrt GlusterFS.

Daniel Berrange (berrange) wrote :

Can you say what kernel version you have, what glusterfs version you have and show the /proc/mounts line for the glusterfs volume

Travis Rhoden (trhoden) wrote :

You are correct that it is Direct IO. My Gluster mount is using FUSE, and FUSE does not support Direct IO. I see this topic come up every few weeks on the Gluster mailing list. It is a limitation of FUSE, not Gluster. Supposedly patches have been submitted to FUSE to add support for Direct IO, but those have not been accepted. Here is a recent mailing list thread about it: http://<email address hidden>/msg08353.html

So, in that sense It's hard to say that you are doing anything wrong. Perhaps this isn't a bug. Or perhaps the cache option needs to be configurable somehow. It's never been clear to me whether editing the libvirt XML template is an expected/normal thing to do in an OpenStack deployment.

For completeness, here is the info you asked for:
root@spcnode2:~# uname -a
Linux spcnode2 3.2.0-19-generic #30-Ubuntu SMP Fri Mar 16 16:27:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@spcnode2:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu precise (development branch)
Release: 12.04
Codename: precise

root@spcnode2:~# glusterfs --version
glusterfs 3.3beta2 built on Mar 6 2012 09:36:03

From /proc/mounts:
localhost:/vmstore /mnt/vmstore fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0

Which is achieved by:
mount -t glusterfs localhost:/vmstore /mnt/vmstore

Daniel Berrange (berrange) wrote :

So what we'll want todo then, is to have Nova do a check whether the storage volume supports Direct IO. If it does, then use cache=none, otherwise fallback to cache=writethrough which does not use Direct I/O, but is still crash safe.

Travis Rhoden (trhoden) wrote :

That sounds like a good plan, Daniel. That way if FUSE ever does support O_DIRECT, it will switch to cache=none without any code changes.

Fix proposed to branch: master
Review: https://review.openstack.org/5606

Changed in nova:
assignee: nobody → Daniel Berrange (berrange)
status: New → In Progress

@travis the patch I posted to Gerrit implements the dynamic checking of O_DIRECT support. While I don't have GlusterFS, I did test this by pointing the 'instances_path' config param to a tmpfs filesystem which similarly lacks O_DIRECT. I could reproduce the error you saw, and after applying my patch guests successfully start again

tags: added: essex-rc-potential
summary: - libvirt disk option "cache=none" prevents VM from booting
+ libvirt disk option "cache=none" prevents VM from booting on
+ GlusterFS/FUSE
Joe Gordon (jogo) wrote :

With regard to the 'essex-rc-potential' tag: This looks like a new feature to me, so this should only target folsom

Daniel Berrange (berrange) wrote :

This isn't a new feature - this is fixing the regression I caused in GIT commit 9f9402693a4465346e2b901055f798ba139c130b

Changed in nova:
importance: Undecided → High

Reviewed: https://review.openstack.org/5606
Committed: http://github.com/openstack/nova/commit/78f3e76d695898aaf846efb9c420e146a982e689
Submitter: Jenkins
Branch: master

commit 78f3e76d695898aaf846efb9c420e146a982e689
Author: Daniel P. Berrange <email address hidden>
Date: Wed Mar 21 11:35:43 2012 +0000

    Fix launching of guests where instances_path is on GlusterFS

    The FUSE module does not (currentl) support O_DIRECT on files.
    This prevents QEMU from starting guests using 'cache=none' for
    their disks located on a GlusterFS filesystem. The same also
    applies for a handful of other filesystems (notably tmpfs, or
    any other FUSE filesystem).

    This patch introduces a startup check in Nova compute service
    which tries to create a file $instances_path/.direct_io.test
    using the O_DIRECT flag. If this succeeds, then cache=none
    will be used for all disks, otherwise it will fallback to
    using cache=writethrough. While the latter does not have
    performance which is as consistent as cache=none, it is still
    host-crash safe and preserves data integrity with migration,
    if the filesystem is cache coherant (cluster filesystems like
    GlusterFS are, NFS by constrast is not).

    By doing the dynamic check for O_DIRECT, we ensure that if
    future FUSE modules gain O_DIRECT support, Nova will automatically
    do the right thing.

    * nova/tests/test_libvirt.py: Stub out os.open in
      the _check_xml_and_disk_driver() to enable testing of
      both O_DIRECT and non-O_DIRECT code paths
    * nova/tests/test_virt_drivers.py: Set instances_path to
      the current directory
    * nova/virt/libvirt.xml.template: Replace hardcoded 'none'
      string with the '$cachemode' variable for all disks.
      Add missing 'cache' attribute for the config disk
    * nova/virt/libvirt/connection.py: Check whether O_DIRECT
      is supported on the "FLAGS.instances_path" directory
      and use 'none' for cachemode if it is, 'writethrough'
      otherwise

    Bug: 959637
    Change-Id: I60cbff1c3ad8299fe2aa37099390f9235f6724d0
    Signed-off-by: Daniel P. Berrange <email address hidden>

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-03-26
Changed in nova:
milestone: none → essex-rc2
tags: removed: essex-rc-potential

Reviewed: https://review.openstack.org/5769
Committed: http://github.com/openstack/nova/commit/01e090963f75c2d97c42004a3df515ae5f6e652d
Submitter: Jenkins
Branch: milestone-proposed

commit 01e090963f75c2d97c42004a3df515ae5f6e652d
Author: Daniel P. Berrange <email address hidden>
Date: Wed Mar 21 11:35:43 2012 +0000

    Fix launching of guests where instances_path is on GlusterFS

    The FUSE module does not (currentl) support O_DIRECT on files.
    This prevents QEMU from starting guests using 'cache=none' for
    their disks located on a GlusterFS filesystem. The same also
    applies for a handful of other filesystems (notably tmpfs, or
    any other FUSE filesystem).

    This patch introduces a startup check in Nova compute service
    which tries to create a file $instances_path/.direct_io.test
    using the O_DIRECT flag. If this succeeds, then cache=none
    will be used for all disks, otherwise it will fallback to
    using cache=writethrough. While the latter does not have
    performance which is as consistent as cache=none, it is still
    host-crash safe and preserves data integrity with migration,
    if the filesystem is cache coherant (cluster filesystems like
    GlusterFS are, NFS by constrast is not).

    By doing the dynamic check for O_DIRECT, we ensure that if
    future FUSE modules gain O_DIRECT support, Nova will automatically
    do the right thing.

    * nova/tests/test_libvirt.py: Stub out os.open in
      the _check_xml_and_disk_driver() to enable testing of
      both O_DIRECT and non-O_DIRECT code paths
    * nova/tests/test_virt_drivers.py: Set instances_path to
      the current directory
    * nova/virt/libvirt.xml.template: Replace hardcoded 'none'
      string with the '$cachemode' variable for all disks.
      Add missing 'cache' attribute for the config disk
    * nova/virt/libvirt/connection.py: Check whether O_DIRECT
      is supported on the "FLAGS.instances_path" directory
      and use 'none' for cachemode if it is, 'writethrough'
      otherwise

    Bug: 959637
    Change-Id: I60cbff1c3ad8299fe2aa37099390f9235f6724d0
    Signed-off-by: Daniel P. Berrange <email address hidden>

Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc2 → 2012.1
Jim Salter (jrssnet) wrote :

FWIW: ZFS on Linux also apparently does not support Direct I/O, because I had the same problem with inability to start VMs with disks set to cache=none. I thought it was a bug in KVM causing the option to completely fail, until finding this bug while looking to file my own bug - but when I tested moving my .qcow2 file to an ext4 partition, sure enough, cache=none worked fine then.

I propose a more informative error dialog should be thrown when this occurs - "filesystem [path] does not support Direct I/O, cache=none not supported" would be a tremendous improvement over the generic "failed to connect to monitor" that's thrown now.

Jim Salter (jrssnet) wrote :

Actually, crap - I might need to re-file this bug anyway; I just noticed this is filed against openstack and really this is a problem with KVM itself. (You don't get a helpful message when trying to start your guest with virsh or with virt-manager either.)

Michael McGarrah (mcgarrah) wrote :

I ran into this as well with an iSCSI mount point from a FreeNAS 8.3.1 on a CentOS 6.4 and Ubuntu 12.04.2 LTS client systems. If the raw or qcow2 files are created on an iSCSI volume they fail with cache=none being the default.

The error message from qemu-kvm kept mentioning a bad parameter. You have to replace the cache=none with the cache=passthrough and it just starts working.

RAW files fail late when they are most of the way through the install of the OS.
QCOW2 files fail immediate when the file is created by qemu-kvm.

This was hugely frustrating since if the files were created before the volume was converted to an iSCSI mountpoint, they appeared to work fine. It was when I tried to create my next image that it failed.

Hope this helps someone else.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers