Comment 2 for bug 1333587

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/102224
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=994cdb234b2b16d97f0276c6356db65817944ee2
Submitter: Jenkins
Branch: master

commit 994cdb234b2b16d97f0276c6356db65817944ee2
Author: Matthew Booth <email address hidden>
Date: Tue Jun 24 12:12:59 2014 +0100

    VMware: Fix race in spawn() when resizing cached image

    spawn() guards against multiple threads simultaneously attempting to
    cache the same image, but it wasn't guarding against them
    simultanously trying to create a resized copy in the cache. Attempting
    to create a large number of images simultaneously of an uncached image
    would result in a race to create the resized image. This resulted in 2
    classes of failed instance:

    1. Instances whose disk was a linked clone of a copy which had been
       subsequently overwritten. These were corrupt.
    2. Instances whose spawn() failed in ExtendVirtualDisk_Task due to a
       locked image.

    This patch creates a Nova-local lock for the resized image. The image
    is in a per-Nova directory on the datastore, so inter-Nova locking is
    not a concern. The lock guards both testing for the existence of the
    image, and its creation. Therefore when multiple processes race, only
    1 will create the resized copy, and all others will find and use it.
    In normal usage this will add the overhead of an additional
    uncontended local lock creation and deletion in spawn().

    In wrapping this code in a lock, we also make certain that any failure
    to create the resized image is appropriately cleaned up. Otherwise
    subsequent users will attempt to use a corrupt copy.

    Change-Id: I3df3d614656e511c909b6c1837582c0d34bf84c6
    Closes-bug: 1333587