EMC VMAX cinder volumes can get out of sync and point at the wrong backend vol

Bug #1401297 reported by Carl Pecinovsky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Helen Walsh

Bug Description

An earlier email was sent to Xing on this. I'll describe the generic issue here.

The VMAX cinder driver locates volumes on the backend by persisting the provider_location information. For example:

'provider_location': u"{'classname': u'Symm_StorageVolume', 'keybindings': {'CreationClassName': u'Symm_StorageVolume', 'SystemName': u'SYMMETRIX+000198701426', 'DeviceID': u'00088', 'SystemCreationClassName': u'Symm_StorageSystem'}}"

This is a CIM object path. The main locator attribute is the DeviceID (00088) within the given VMAX array. The shortcoming of this is that the DeviceID is only unique at an instant in time. That deviceID can be deallocated from the storage pool and then rebound to the same DeviceID again at a later time for a different volume. There are at least two cases where this can cause a problem for cinder:

1. The volume (volA) is deleted out-of-band, such that OpenStack is not aware of it.
2. The volume (volA) is requested to be deleted through cinder, but the delete times out because the max polling retry attempts are exceeded. See https://bugs.launchpad.net/cinder/+bug/1401279

(a) In both of these cases, the cinder volume record remains, but the volume on the backend is actually gone (at some point).
(b) Now a new volume is created (volB). It is assigned the freed-up deviceID. So there are two cinder volumes with the same provider_location information.
(c) Critical data is added to volB.
(d) The administrator requests that volA be deleted (for example, they see it failed the first time per case #2 above, and they want to try again).
(e) since the provider_location of volA now points to volB backend, the new volume is deleted and the data is lost.

Proposal:
In addition to the CIM opbject path, store the "EMCWWN" property value in the provider_location as well.

<PROPERTY NAME="EMCWWN" TYPE="string">
<VALUE>60000970000198700498533030393730</VALUE>
</PROPERTY>

This should be the code page 83 globally unique ID across time. Now modify _find_lun(self, volume) method such that after looking up the CIM instance, it verifies the EMCWWN value against the one stored in the provider_location. If it is a match, all is good. If it is not a match, then it is not the volume that cinder was tracking. The tracked volume is actually gone from the backend and a lookup error should be raised.

Revision history for this message
Xing Yang (xing-yang) wrote :

See this patch: https://review.openstack.org/#/c/140910/. We are saving volume uuid in ElementName. Volume uuid is unique. Even if you have two volumes with the same object path, the ElementName will be different. So I don't think EMCWWN is needed. Anyway, we'll see how to address this problem. Thanks.

Changed in cinder:
assignee: nobody → Xing Yang (xing-yang)
status: New → Confirmed
importance: Undecided → Medium
tags: added: drivers emc
Revision history for this message
Carl Pecinovsky (csky) wrote :

Xing,
Given your proposed fix under 140910, then the fix for this current bug would be to verify that the ElementName retrieved in _find_lun() matches the volume['id'].

This takes things in a somewhat different direction than what we put in place as a work-around for https://bugs.launchpad.net/cinder/+bug/1395903. But it should be OK for OpenStack.

Revision history for this message
Xing Yang (xing-yang) wrote :

Carl,

I think the fix merged in https://review.openstack.org/140910 should have fixed this problem.

Revision history for this message
Carl Pecinovsky (csky) wrote :

Xing,
Please see my comment #2. It does not look fixed because I was expecting to see code in _find_lun() that validated the ElementName of the looked up volume -- that it matched the volume cinder id for this cinder db. The issue is that a separate cinder process + db could have unbound the volume from the pool and re-bound a different volume with the same DeviceID.

Revision history for this message
Carl Pecinovsky (csky) wrote : Fw: EMC Support Product Updates

Helen,
It appears v8.0.3 is available. I have not seen anything new posted to
https://bugs.launchpad.net/cinder/+bug/1450647, so wondering when you
expect a fix will go up for review. Thanks.

Carl Pecinovsky
PowerVC Storage, Systems Group
Rochester MN, Dept WLOA, 015-3/G205
----- Forwarded by Carl Pecinovsky/Rochester/IBM on 2015-06-09 10:20 AM
-----

From: <email address hidden>
To: Carl Pecinovsky/Rochester/IBM@IBMUS
Date: 2015-06-07 07:07 AM
Subject: EMC Support Product Updates

Your EMC Product Updates
Updates are now available for the following product(s):
Notification Frequency: WEEKLY

Product(s)
Content Type
Date
Title
SMI-S Provider
Product Documentation
2015-06-05
Solutions Enabler, VSS Provider and SMI-S Provider 8.0.3 Release Notes
SMI-S Provider
Product Documentation
2015-06-05
SMI-S Provider 8.0.3 Programmer's Guide

Service Life

EMC Firmware Release and End of Service Life Notifications
EMC Hardware Release and End of Service Life Notifications
EMC Software Release and End of Service Life Notifications

You are receiving this notification because you have subscribed to product
updates through the EMC Support website.
Sincerely,
EMC Customer Service

Manage your Subscriptions
Can't find what you're looking for through one of our self-service tools?
Consider using Live Chat for your next Service Request. With Live Chat you
have access to a subject area expert for fast assistance with any type of
issue or question. To try Live Chat, simply go to Live Chat page.
Do not reply to this e-mail.
To ensure delivery of this email, add <email address hidden> to your email
address book or safe list.
© 2012 EMC Corporation. All rights reserved.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Automatically unassigning due to inactivity.

Changed in cinder:
assignee: Xing Yang (xing-yang) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/371545

Changed in cinder:
assignee: nobody → Helen Walsh (walshh2)
status: Confirmed → In Progress
Changed in cinder:
milestone: none → ocata-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/371545
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=ae4d4799375cb8253b2f06e023e9d681de672053
Submitter: Jenkins
Branch: master

commit ae4d4799375cb8253b2f06e023e9d681de672053
Author: Helen Walsh <email address hidden>
Date: Fri Sep 16 14:41:03 2016 +0100

    VMAX driver - Ensure VMAX volume matches cinder db volume

    The VMAX cinder driver locates volumes on the backend by
    using the volume's backend 'DeviceID'. It does not check
    to ensure the volume retrieved matches the volume in the
    cinder database. This can cause inconsistencies in cinder,
    where a volume may be deleted off the backend, but remains
    in the cinder database. This patch addresses the issue by
    ensuring the 'ElementName' of the volume retrieved matches
    the VMAX cinder driver element name ('OS-' + cinder UUID).

    Change-Id: I8da37cd772d24fbb63a6fb12ccd782ffb4fef5c2
    Closes-Bug: #1401297

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 10.0.0.0b2

This issue was fixed in the openstack/cinder 10.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (driverfixes/newton)

Fix proposed to branch: driverfixes/newton
Review: https://review.openstack.org/536396

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (driverfixes/newton)

Reviewed: https://review.openstack.org/536396
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=cca267a3e333b1074b63830c511664fa94f5cbd2
Submitter: Zuul
Branch: driverfixes/newton

commit cca267a3e333b1074b63830c511664fa94f5cbd2
Author: Helen Walsh <email address hidden>
Date: Fri Sep 16 14:41:03 2016 +0100

    VMAX driver - Ensure VMAX volume matches cinder db volume

    The VMAX cinder driver locates volumes on the backend by
    using the volume's backend 'DeviceID'. It does not check
    to ensure the volume retrieved matches the volume in the
    cinder database. This can cause inconsistencies in cinder,
    where a volume may be deleted off the backend, but remains
    in the cinder database. This patch addresses the issue by
    ensuring the 'ElementName' of the volume retrieved matches
    the VMAX cinder driver element name (cinder UUID).

    Fix relevant for Ocata and Newton only as VMAX driver changed
    from SMI-S to REST in Pike.

    Change-Id: I8da37cd772d24fbb63a6fb12ccd782ffb4fef5c2
    Closes-Bug: #1401297
    (cherry picked from commit ae4d4799375cb8253b2f06e023e9d681de672053)

tags: added: in-driverfixes-newton
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.