EMC VMAX cinder volumes can get out of sync and point at the wrong backend vol
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Fix Released
|
Medium
|
Helen Walsh |
Bug Description
An earlier email was sent to Xing on this. I'll describe the generic issue here.
The VMAX cinder driver locates volumes on the backend by persisting the provider_location information. For example:
'provider_
This is a CIM object path. The main locator attribute is the DeviceID (00088) within the given VMAX array. The shortcoming of this is that the DeviceID is only unique at an instant in time. That deviceID can be deallocated from the storage pool and then rebound to the same DeviceID again at a later time for a different volume. There are at least two cases where this can cause a problem for cinder:
1. The volume (volA) is deleted out-of-band, such that OpenStack is not aware of it.
2. The volume (volA) is requested to be deleted through cinder, but the delete times out because the max polling retry attempts are exceeded. See https:/
(a) In both of these cases, the cinder volume record remains, but the volume on the backend is actually gone (at some point).
(b) Now a new volume is created (volB). It is assigned the freed-up deviceID. So there are two cinder volumes with the same provider_location information.
(c) Critical data is added to volB.
(d) The administrator requests that volA be deleted (for example, they see it failed the first time per case #2 above, and they want to try again).
(e) since the provider_location of volA now points to volB backend, the new volume is deleted and the data is lost.
Proposal:
In addition to the CIM opbject path, store the "EMCWWN" property value in the provider_location as well.
<PROPERTY NAME="EMCWWN" TYPE="string">
<VALUE>
</PROPERTY>
This should be the code page 83 globally unique ID across time. Now modify _find_lun(self, volume) method such that after looking up the CIM instance, it verifies the EMCWWN value against the one stored in the provider_location. If it is a match, all is good. If it is not a match, then it is not the volume that cinder was tracking. The tracked volume is actually gone from the backend and a lookup error should be raised.
Changed in cinder: | |
milestone: | none → ocata-1 |
See this patch: https:/ /review. openstack. org/#/c/ 140910/. We are saving volume uuid in ElementName. Volume uuid is unique. Even if you have two volumes with the same object path, the ElementName will be different. So I don't think EMCWWN is needed. Anyway, we'll see how to address this problem. Thanks.