OpenStack Compute (nova)

[libvirt] resize fails when using NFS shared storage

Bug #1218372 reported by Xavier Queralt on 2013-08-29

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	High	Mark Doffman

Bug Description

With two hosts installed using devstack with a multi-node configuration and the directory /opt/stack/data/nova/instances/ shared using NFS.

When performing a resize I get the following error (Complete traceback in http://paste.openstack.org/show/45368/):

"qemu-img: Could not open '/opt/stack/data/nova/instances/7dbeb7f2-39e2-4f1d-8228-0b7a84d27745/disk': Permission denied\n"

This problem was introduced with patch https://review.openstack.org/28424 which modified the behaviour of migrate/resize when using shared storage. Before that, the disk was moved to the new host using ssh even if using shared storage (which could cause some data loss when an error happened) but now, if we're using shared storage it won't send the disk to the other host but only assume that it will be accessible from there. In the end both are using the same storage, why should this be a problem?

After doing some research on how NFS handles its shares on the client side, I realized that NFS client keeps a file cache with the file name and the inodes which, if no process asks for it before, will be refreshed on intervals of from 3 to 60 seconds (See nfs options ac[dir|reg][min|max] in nfs' manpage). So, if a process tries to access a file which has been renamed on the remote server it will be accessing the old version because the name is still pointing to the old inode (cache won't be updated when accessing a file but only when asking for the file attributes, e.g. ls -lh)

In the resize case, the origin compute node renamed the instance directory to "$INSTANCE_DIR/<instance_uuid>_resize" (owned by root after qemu stops) and created the new instance disk from it under the new directory "$INSTANCE_DIR/<instance_uuid>".

From the destination host, even thought we were trying to access the new disk file from "$INSTANCE_DIR/<instance_uuid>/disk" we were still holding the old inode for that path which pointed to "$INSTANCE_DIR/<instance_uuid>_resize/disk" (owned by root, inaccessible, the wrong image, etc, etc).

If the NFS share is mounted with the option "noac" which (from manpage) "forces application writes to become synchronous so that local changes to a file become visible on the server immediately". This prevents the files to be out of sync, but it comes with the drawback of issuing a network call for every file operation which may cause performance issues.

See original description

Tags:

Xavier Queralt (xqueralt-deactivatedaccount) on 2013-08-29

description:	updated
Changed in nova:
assignee:	nobody → Xavier Queralt (xqueralt)

Alan Pevec (apevec) on 2013-08-29

Changed in nova:
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-08-29: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/44359

Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-27:

Fix proposed to branch: master
Review: https://review.openstack.org/58852

Revision history for this message

Rafi Khardalian (rkhardalian) wrote on 2014-02-02:

There's a much simpler solution here. We should simply recommend "lookupcache=none" be set as an NFS mount option.

Per the NFS man page:
===
If the client ignores its cache and validates every application lookup request with the server, that client can immediately detect when a new directory entry has been either created or removed by another client. You can specify this behavior using lookupcache=none. The extra NFS requests needed if the client does not cache directory entries can exact a performance penalty. Disabling lookup caching should result in less of a performance penalty than using noac, and has no effect on how the NFS client caches the attributes of files
===

The other option is to flush all the caches (kernel doc snippet below):

===
Writing to this will cause the kernel to drop clean caches, dentries and
inodes from memory, causing that memory to become free.

To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
echo 3 > /proc/sys/vm/drop_caches

As this is a non-destructive operation and dirty objects are not freeable, the
user should run `sync' first.
===

In our case, I'd propose running 'sync' and then echo 2 into drop_caches on the destination.

Xavier Queralt (xqueralt-deactivatedaccount) on 2014-03-20

Changed in nova:
assignee:	Xavier Queralt (xqueralt) → nobody

Tracy Jones (tjones-i) on 2014-07-03

Changed in nova:
status:	In Progress → New

Tracy Jones (tjones-i) on 2014-07-03

Changed in nova:
status:	New → Triaged

Sean Dague (sdague) on 2014-09-19

no longer affects:

nova/grizzly

Sean Dague (sdague) on 2015-03-30

Changed in nova:
status:	Triaged → Confirmed

venkatesh (p-venkatesh551) on 2015-08-05

Changed in nova:
assignee:	nobody → venkatesh (p-venkatesh551)
assignee:	venkatesh (p-venkatesh551) → nobody

Mark Doffman (mjdoffma) on 2015-09-15

Changed in nova:
assignee:	nobody → Mark Doffman (mjdoffma)

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-09-15:

https://review.openstack.org/#/c/28424/ landed in Havana. Is this still valid? I know there were some fixes for Ceph shared storage and resize made in Kilo which we also backported to stable/juno. I'm not sure if those would also resolve issues for NFS, but I'd think they are related, so marking this invalid at this point. Please re-open if this is still an issue.