commit 994cdb234b2b16d97f0276c6356db65817944ee2
Author: Matthew Booth <email address hidden>
Date: Tue Jun 24 12:12:59 2014 +0100
VMware: Fix race in spawn() when resizing cached image
spawn() guards against multiple threads simultaneously attempting to
cache the same image, but it wasn't guarding against them
simultanously trying to create a resized copy in the cache. Attempting
to create a large number of images simultaneously of an uncached image
would result in a race to create the resized image. This resulted in 2
classes of failed instance:
1. Instances whose disk was a linked clone of a copy which had been
subsequently overwritten. These were corrupt.
2. Instances whose spawn() failed in ExtendVirtualDisk_Task due to a
locked image.
This patch creates a Nova-local lock for the resized image. The image
is in a per-Nova directory on the datastore, so inter-Nova locking is
not a concern. The lock guards both testing for the existence of the
image, and its creation. Therefore when multiple processes race, only
1 will create the resized copy, and all others will find and use it.
In normal usage this will add the overhead of an additional
uncontended local lock creation and deletion in spawn().
In wrapping this code in a lock, we also make certain that any failure
to create the resized image is appropriately cleaned up. Otherwise
subsequent users will attempt to use a corrupt copy.
Reviewed: https:/ /review. openstack. org/102224 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=994cdb234b2 b16d97f0276c635 6db65817944ee2
Committed: https:/
Submitter: Jenkins
Branch: master
commit 994cdb234b2b16d 97f0276c6356db6 5817944ee2
Author: Matthew Booth <email address hidden>
Date: Tue Jun 24 12:12:59 2014 +0100
VMware: Fix race in spawn() when resizing cached image
spawn() guards against multiple threads simultaneously attempting to
cache the same image, but it wasn't guarding against them
simultanously trying to create a resized copy in the cache. Attempting
to create a large number of images simultaneously of an uncached image
would result in a race to create the resized image. This resulted in 2
classes of failed instance:
1. Instances whose disk was a linked clone of a copy which had been sk_Task due to a
subsequently overwritten. These were corrupt.
2. Instances whose spawn() failed in ExtendVirtualDi
locked image.
This patch creates a Nova-local lock for the resized image. The image
is in a per-Nova directory on the datastore, so inter-Nova locking is
not a concern. The lock guards both testing for the existence of the
image, and its creation. Therefore when multiple processes race, only
1 will create the resized copy, and all others will find and use it.
In normal usage this will add the overhead of an additional
uncontended local lock creation and deletion in spawn().
In wrapping this code in a lock, we also make certain that any failure
to create the resized image is appropriately cleaned up. Otherwise
subsequent users will attempt to use a corrupt copy.
Change-Id: I3df3d614656e51 1c909b6c1837582 c0d34bf84c6
Closes-bug: 1333587