[libvirt] resize fails when using NFS shared storage

Bug #1218372 reported by Xavier Queralt
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Mark Doffman

Bug Description

With two hosts installed using devstack with a multi-node configuration and the directory /opt/stack/data/nova/instances/ shared using NFS.

When performing a resize I get the following error (Complete traceback in http://paste.openstack.org/show/45368/):

"qemu-img: Could not open '/opt/stack/data/nova/instances/7dbeb7f2-39e2-4f1d-8228-0b7a84d27745/disk': Permission denied\n"

This problem was introduced with patch https://review.openstack.org/28424 which modified the behaviour of migrate/resize when using shared storage. Before that, the disk was moved to the new host using ssh even if using shared storage (which could cause some data loss when an error happened) but now, if we're using shared storage it won't send the disk to the other host but only assume that it will be accessible from there. In the end both are using the same storage, why should this be a problem?

After doing some research on how NFS handles its shares on the client side, I realized that NFS client keeps a file cache with the file name and the inodes which, if no process asks for it before, will be refreshed on intervals of from 3 to 60 seconds (See nfs options ac[dir|reg][min|max] in nfs' manpage). So, if a process tries to access a file which has been renamed on the remote server it will be accessing the old version because the name is still pointing to the old inode (cache won't be updated when accessing a file but only when asking for the file attributes, e.g. ls -lh)

In the resize case, the origin compute node renamed the instance directory to "$INSTANCE_DIR/<instance_uuid>_resize" (owned by root after qemu stops) and created the new instance disk from it under the new directory "$INSTANCE_DIR/<instance_uuid>".

From the destination host, even thought we were trying to access the new disk file from "$INSTANCE_DIR/<instance_uuid>/disk" we were still holding the old inode for that path which pointed to "$INSTANCE_DIR/<instance_uuid>_resize/disk" (owned by root, inaccessible, the wrong image, etc, etc).

If the NFS share is mounted with the option "noac" which (from manpage) "forces application writes to become synchronous so that local changes to a file become visible on the server immediately". This prevents the files to be out of sync, but it comes with the drawback of issuing a network call for every file operation which may cause performance issues.

Tags: nfs resize
description: updated
Changed in nova:
assignee: nobody → Xavier Queralt (xqueralt)
Alan Pevec (apevec)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/44359

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/58852

Revision history for this message
Rafi Khardalian (rkhardalian) wrote :

There's a much simpler solution here. We should simply recommend "lookupcache=none" be set as an NFS mount option.

Per the NFS man page:
===
If the client ignores its cache and validates every application lookup request with the server, that client can immediately detect when a new directory entry has been either created or removed by another client. You can specify this behavior using lookupcache=none. The extra NFS requests needed if the client does not cache directory entries can exact a performance penalty. Disabling lookup caching should result in less of a performance penalty than using noac, and has no effect on how the NFS client caches the attributes of files
===

The other option is to flush all the caches (kernel doc snippet below):

===
Writing to this will cause the kernel to drop clean caches, dentries and
inodes from memory, causing that memory to become free.

To free pagecache:
 echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
 echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
 echo 3 > /proc/sys/vm/drop_caches

As this is a non-destructive operation and dirty objects are not freeable, the
user should run `sync' first.
===

In our case, I'd propose running 'sync' and then echo 2 into drop_caches on the destination.

Changed in nova:
assignee: Xavier Queralt (xqueralt) → nobody
Tracy Jones (tjones-i)
Changed in nova:
status: In Progress → New
Tracy Jones (tjones-i)
Changed in nova:
status: New → Triaged
Sean Dague (sdague)
no longer affects: nova/grizzly
Sean Dague (sdague)
Changed in nova:
status: Triaged → Confirmed
Changed in nova:
assignee: nobody → venkatesh (p-venkatesh551)
assignee: venkatesh (p-venkatesh551) → nobody
Mark Doffman (mjdoffma)
Changed in nova:
assignee: nobody → Mark Doffman (mjdoffma)
Revision history for this message
Matt Riedemann (mriedem) wrote :

https://review.openstack.org/#/c/28424/ landed in Havana. Is this still valid? I know there were some fixes for Ceph shared storage and resize made in Kilo which we also backported to stable/juno. I'm not sure if those would also resolve issues for NFS, but I'd think they are related, so marking this invalid at this point. Please re-open if this is still an issue.

Changed in nova:
status: Confirmed → Invalid
tags: added: nfs resize
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is the fix I was thinking of for resize with shared storage, made in kilo and backported to stable/juno:

https://review.openstack.org/#/c/139693/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.