Comment 3 for bug 1646896

Revision history for this message
Mihail (mihail.itgix) wrote :

We have a reason to beleve that the root cause is the nfs client and server hitting a deadlock, because they are running on the same machine using the same memory resources. This is due to the loopback NFS mounts, as the way nova operates is to mount NFS even for local volumes.

More information can be found in this article: https://lwn.net/Articles/595652/

While this is a purely NFS issue, there are ways to mitigate the issue in nova as resolution in NFS package is nowhere to be seen for years.

Our proposal is the following. Add a check in nova to determine if the volume is from a local cinder storage node. If storage is local do a "bind" mount instead of a normal NFS client mount. This way we work around the NFS loopback deadlock problem.
The imperfection of the change is that a bind mount cannot be discovered by the is_mounted method and will be mounted again on every request. Also it will currently not pick up a local mount if the nfs_shares address is a hostname instead of an IP address.

Here is the diff of the proposed change:

49,55d48
< """Misho : Needed for nfs client to recognize local shares """
< import socket
< import fcntl
< import struct
< import netifaces
< import pprint
<
937d929
< LOG.debug("Checking if the mount path is mounted: " + mount_path)
945,964c937
< address = []
<
< for iface in netifaces.interfaces():
< try:
< alladdr = netifaces.ifaddresses(iface)[netifaces.AF_INET]
< ## [{'peer': '127.0.0.1', 'netmask': '255.0.0.0', 'addr': '127.0.0.1'} ...
< for theaddr in alladdr:
< address.append( theaddr['addr'] ) # '192.168.0.110'
< except:
< LOG.debug("Couldn't get address for :"+iface)
<
<
< islocal = False
< for addr in address:
< if addr in nfs_share :
< nfs_cmd = ['mount', '-o', 'bind']
< nfsserver, nfs_path = nfs_share.split(':')
< islocal = True
< LOG.warn("address "+str(addr)+ " found in mount string.")
< nfs_cmd.extend([nfs_path, mount_path])
---
>
966,975c939,944
< if islocal:
< LOG.warn("Address of share is local, do bind mount")
<
< else:
< nfs_cmd = ['mount', '-t', 'nfs']
< if CONF.libvirt.nfs_mount_options is not None:
< nfs_cmd.extend(['-o', CONF.libvirt.nfs_mount_options])
< if options:
< nfs_cmd.extend(options.split(' '))
< nfs_cmd.extend([nfs_share, mount_path])
---
> nfs_cmd = ['mount', '-t', 'nfs']
> if CONF.libvirt.nfs_mount_options is not None:
> nfs_cmd.extend(['-o', CONF.libvirt.nfs_mount_options])
> if options:
> nfs_cmd.extend(options.split(' '))
> nfs_cmd.extend([nfs_share, mount_path])