OSError failure to read when creating multiple instances with NFS

Bug #1809123 reported by Tim Rozet
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Nell Jerram
Queens
Fix Committed
Medium
Matt Riedemann
Rocky
Fix Committed
Medium
Matt Riedemann

Bug Description

There is a race condition in when launching multiple instances over NFS simultaneously that can end up causing the os.utime function to fail when updating the mtime for the image base:

2018-12-15 14:22:38.740 7 INFO nova.virt.libvirt.driver [req-d33edf35-733b-4591-831c-666cd159cee1 8965b22a11c44875a90fe88f50769a5a b9644067db0d44789e19d9d032287ada - default default] [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] Creating image
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [req-d33edf35-733b-4591-831c-666cd159cee1 8965b22a11c44875a90fe88f50769a5a b9644067db0d44789e19d9d032287ada - default default] [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] Instance failed to spawn: OSError: [Errno 13] Permission denied
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] Traceback (most recent call last):
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2252, in _build_resources
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] yield resources
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2032, in _build_and_run_instance
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] block_device_info=block_device_info)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3091, in spawn
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] block_device_info=block_device_info)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3469, in _create_image
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] fallback_from_host)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3560, in _create_and_inject_local_root
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] instance, size, fallback_from_host)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7634, in _try_fetch_image_cache
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] size=size)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 243, in cache
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] *args, **kwargs)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 601, in create_image
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] nova.privsep.path.utime(base)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] return self.channel.remote_call(name, args, kwargs)
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] raise exc_type(*result[2])
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552] OSError: [Errno 13] Permission denied
2018-12-15 14:22:38.747 7 ERROR nova.compute.manager [instance: 6fec5d88-09ab-4ecc-815d-c08c298fe552]

The NFS settings are correct:
[root@overcloud-novacompute-0 ~]# mount | grep nova
overcloud-controller-0.opnfvlf.org:/nova on /var/lib/nova/instances type nfs4 (rw,relatime,sync,context=system_u:object_r:nfs_t:s0,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,nosharecache,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.0.2.44,local_lock=none,addr=192.0.2.40,_netdev)

[root@overcloud-novacompute-0 ~]# getenforce
Permissive

[root@overcloud-novacompute-0 ~]# ls -lrt /var/lib/nova/
total 0
drwxrwxrwx. 4 42436 42436 53 Dec 14 19:52 instances

[root@overcloud-novacompute-0 ~]# ls -lrt /var/lib/nova/instances/
total 4
drwxr-xr-x. 2 42436 42436 54 Dec 14 19:16 _base
drwxr-xr-x. 2 42436 42436 93 Dec 14 19:52 locks
-rw-r--r--. 1 42436 42436 58 Dec 14 20:33 compute_nodes

[root@overcloud-novacompute-0 ~]# ls -lrt /var/lib/nova/instances/_base/
total 13384
-rw-r--r--. 1 42427 42427 46137344 Dec 14 19:18 78c6ae575fd026484b1f46588b5c337b3fd53558

[root@overcloud-novacompute-0 ~]# docker exec -it bf4658c561e1 /bin/bash
()[nova@overcloud-novacompute-0 /]$ id -u nova
42436
()[nova@overcloud-novacompute-0 /]$ id -u qemu
42427

()[nova@overcloud-novacompute-0 /]$ ls -lrt /var/lib/nova/instances/
total 4
drwxr-xr-x. 2 nova nova 54 Dec 14 19:16 _base
drwxr-xr-x. 2 nova nova 93 Dec 14 19:52 locks
-rw-r--r--. 1 nova nova 58 Dec 14 20:33 compute_nodes
()[nova@overcloud-novacompute-0 /]$ ls -lrt /var/lib/nova/instances/_base/
total 13384
-rw-r--r--. 1 qemu qemu 46137344 Dec 14 19:18 78c6ae575fd026484b1f46588b5c337b3fd53558

Tim Rozet (trozet)
Changed in nova:
assignee: nobody → Tim Rozet (trozet)
status: New → In Progress
Revision history for this message
Nell Jerram (neil-jerram) wrote :

I have been regularly hitting this too, in a test suite (for the Calico networking driver) using Queens.

The typical pattern for me is:
- launch an instance on a compute node (which is an NFS client); this succeeds
- after 20-30s, launch another instance on the same compute node; this fails with the same call stack as above.

But I guess that's close enough for us to be talking about the same root cause here.

The biggest thing that I don't understand about this is: why aren't there loads of OpenStack users hitting this all the time? Do most OpenStack deployment using Queens or later not use NFS for instance ephemeral storage? If not, is that because they don't care about live migration, or because they're getting that some other way?

(I should add: I'm pretty sure this problem was introduced going from Pike to Queens. I run the same test suite regularly on Pike and Queens, and the Pike test never sees a problem like this; and the nova.privsep.path.utime code was introduced between Pike and Queens.)

Revision history for this message
Nell Jerram (neil-jerram) wrote :

Since yesterday I have been running my test successfully with the workaround at https://review.openstack.org/#/c/633493/. In other words, just don't worry if that utime call fails. I'd be interested to hear if you think there's anything fundamentally wrong with that; it feels to me like a simpler fix than the locking that you have proposed.

Revision history for this message
Nell Jerram (neil-jerram) wrote :

Actually, after looking at your patch in detail, s/simpler/less mysterious/ !

Revision history for this message
Tim Rozet (trozet) wrote :

Hey Neil,
"The biggest thing that I don't understand about this is: why aren't there loads of OpenStack users hitting this all the time? Do most OpenStack deployment using Queens or later not use NFS for instance ephemeral storage? If not, is that because they don't care about live migration, or because they're getting that some other way?"

I wondered the same thing. I'm guessing most folks use something like Ceph for shared storage (at last users using TripleO).

Looking at your patch I think the modifying the file time is done for a reason (I cant recall what it was but I remember it was to notify some other mechanism in nova about the cache), so rather than just ignoring the failure, my patch in my testing resolves it. Did you ever try using my patch to see if the problem still happens for you?

Revision history for this message
Nell Jerram (neil-jerram) wrote :

Tim, I'm sorry, I only just noticed your comment, because I forgot to subscribe to this bug.

I believe the base file time is modified so that the image cache manager knows that that base image was recently in use. (The "image cache manager" is actually an image garbage collector, IIUC.)

I haven't tried your patch instead of mine yet. I'll aim to do that next week, and will report back here.

Revision history for this message
Nell Jerram (neil-jerram) wrote :

Tim - just wanted to let you know that I've today been able to test that your fix (as at https://review.openstack.org/#/c/625741/2) seems reliably successful with my team's system testing. I've run system tests with your fix for about 15 hours today, and they were all successful. (I'll comment this on the change as well.)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/625741
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=525631d8dc058910728e55def616358b0e7f2f69
Submitter: Zuul
Branch: master

commit 525631d8dc058910728e55def616358b0e7f2f69
Author: Tim Rozet <email address hidden>
Date: Mon Dec 17 19:44:54 2018 -0500

    Fixes race condition with privsep utime

    There is a race condition that occurs over NFS when multiple instances
    are being created where utime fails, due to some other process
    modifying the file path. This patch ensures the path is created and
    is readable before attempting to modify with utime.

    Closes-Bug: 1809123

    Change-Id: Id68aa27a8ab08d9c00655e5ed6b48d194aa8e6f6
    Signed-off-by: Tim Rozet <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/641023

Matt Riedemann (mriedem)
Changed in nova:
assignee: Tim Rozet (trozet) → Neil Jerram (neil-jerram)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/641315
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=362bba9011f8445285c3f72a0b00630163bfbc2f
Submitter: Zuul
Branch: master

commit 362bba9011f8445285c3f72a0b00630163bfbc2f
Author: Neil Jerram <email address hidden>
Date: Sat Jan 26 21:22:33 2019 +0000

    Allow utime call to fail on qcow2 image base file

    This utime call sometimes fails, with EACCES (Permission Denied), when
    the base file is on an NFS client filesystem. I don't understand why,
    but wonder if it's a similar problem as the one that motivated using
    touch instead of utime in ec9d5e375e2. In any case, IIUC, timing
    isn't the primary thing that the image cache manager uses to determine
    when the base file is in use. The primary mechanism for that is
    whether there is a matching disk file for a current instance. The
    timestamp on the base file is only used when deciding whether to
    delete a base file that is _not_ in use; so it is not a big deal if
    that deletion happens slightly earlier, for an unused base file,
    because of one of these preceding utime calls having failed.

    Closes-Bug: #1809123
    Co-Authored-By: Matthew Booth <email address hidden>
    Change-Id: Idc131ff426f1707150867030fa5a69b77a7fc832

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/642997

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/642998

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/643011

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/rocky)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/rocky
Review: https://review.openstack.org/642998
Reason: Proper backport is here:

https://review.openstack.org/#/c/643011/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/queens)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/642997
Reason: Fixing up the cherry pick here:

https://review.openstack.org/#/c/633493/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/rocky)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/rocky
Review: https://review.openstack.org/641023

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/643011
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7d0dd625d35a08d159b50154809a53e8444ae25c
Submitter: Zuul
Branch: stable/rocky

commit 7d0dd625d35a08d159b50154809a53e8444ae25c
Author: Neil Jerram <email address hidden>
Date: Sat Jan 26 21:22:33 2019 +0000

    Allow utime call to fail on qcow2 image base file

    This utime call sometimes fails, with EACCES (Permission Denied), when
    the base file is on an NFS client filesystem. I don't understand why,
    but wonder if it's a similar problem as the one that motivated using
    touch instead of utime in ec9d5e375e2. In any case, IIUC, timing
    isn't the primary thing that the image cache manager uses to determine
    when the base file is in use. The primary mechanism for that is
    whether there is a matching disk file for a current instance. The
    timestamp on the base file is only used when deciding whether to
    delete a base file that is _not_ in use; so it is not a big deal if
    that deletion happens slightly earlier, for an unused base file,
    because of one of these preceding utime calls having failed.

    Closes-Bug: #1809123
    Co-Authored-By: Matthew Booth <email address hidden>
    Change-Id: Idc131ff426f1707150867030fa5a69b77a7fc832
    (cherry picked from commit 362bba9011f8445285c3f72a0b00630163bfbc2f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/633493
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2b1544390cd0afd4d2b0c6ea4636a49a73b50dcf
Submitter: Zuul
Branch: stable/queens

commit 2b1544390cd0afd4d2b0c6ea4636a49a73b50dcf
Author: Neil Jerram <email address hidden>
Date: Sat Jan 26 21:22:33 2019 +0000

    Allow utime call to fail on qcow2 image base file

    This utime call sometimes fails, with EACCES (Permission Denied), when
    the base file is on an NFS client filesystem. I don't understand why,
    but wonder if it's a similar problem as the one that motivated using
    touch instead of utime in ec9d5e375e2. In any case, IIUC, timing
    isn't the primary thing that the image cache manager uses to determine
    when the base file is in use. The primary mechanism for that is
    whether there is a matching disk file for a current instance. The
    timestamp on the base file is only used when deciding whether to
    delete a base file that is _not_ in use; so it is not a big deal if
    that deletion happens slightly earlier, for an unused base file,
    because of one of these preceding utime calls having failed.

    Conflicts:
          nova/tests/unit/virt/libvirt/test_imagebackend.py

    NOTE(mriedem): Conflict is due to not having change
    I154b1ef14dbace803841cb3eed3ff02dd0020588 in Queens.

    Closes-Bug: #1809123
    Co-Authored-By: Matthew Booth <email address hidden>
    Change-Id: Idc131ff426f1707150867030fa5a69b77a7fc832
    (cherry picked from commit 362bba9011f8445285c3f72a0b00630163bfbc2f)
    (cherry picked from commit 7d0dd625d35a08d159b50154809a53e8444ae25c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.10

This issue was fixed in the openstack/nova 17.0.10 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.2.0

This issue was fixed in the openstack/nova 18.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.