SolidFire driver is creating duplicate volumes when API responses are lost and retried

Bug #1896112 reported by Fernando Ferraz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Fernando Ferraz

Bug Description

Customer is having duplicate volume errors when network drop the response from the solidfire cluster to the cinder driver to confirm the volume creation. Thus it resends the volume to be created when it was already successfully created. This creates duplicate volumes with the same name and this customer's Openstack attaches hosts to volume names which can cause issues when volume names are duplicated.

More specifically in def create_volume(self, volume), we should check for an existing sf_volume_prefix before sending the create volume api command to SolidFire to ensure we never create duplicate volumes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/756184

Changed in cinder:
assignee: nobody → Fernando Ferraz (fernando-ferraz)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/756184
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=42c92cc407d475751bc61c98445c9c0740a71496
Submitter: Zuul
Branch: master

commit 42c92cc407d475751bc61c98445c9c0740a71496
Author: Fernando Ferraz <email address hidden>
Date: Mon Oct 5 19:20:15 2020 -0300

    NetApp SolidFire: Fix duplicate volume when API response is lost

    The SolidFire driver retries API requests in case a connection
    error occurrs. When network is unstable, there may be the
    possibility that the SolidFire backend successfully receive
    and process a create volume operation, but fail to deliver the
    response back to the driver.

    When this scenario occurrs, the SolidFire driver automatically
    resends the request, creating a second volume and leaving a
    duplicate unused. Although this doesn't affect
    driver functionality at first moment (the volume id from the
    cluster is always correctly associated to cinder provider id),
    further operations may hit the unused volume, leading to
    unexpected hehavior.

    This patch fixes this issue by:

    1. Checking if the volume name already exists in the
    backend before trying to create it. Volume creation will
    raise a exception and abort in case of a volume is found.

    2. Checking for volume creation right after a read timeout is
    detected, preventing invalid API calls.

    3. Adding option ´sf_volume_create_timeout´ to the SolidFire
    driver, to allow users to set the appropriate timeout value for
    their environment.

    Closes-Bug: #1896112
    Change-Id: I4383b691a8cc4aacb046332e418aafb88ba8ba56

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.0.0.0b1

This issue was fixed in the openstack/cinder 18.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.1.0

This issue was fixed in the openstack/cinder 17.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 16.3.0

This issue was fixed in the openstack/cinder 16.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.5.0

This issue was fixed in the openstack/cinder 15.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/queens)

Change abandoned by "Eric Harney <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/cinder/+/764278
Reason: This has been here for a while but the Stein backport is not passing unit test jobs yet: https://review.opendev.org/c/openstack/cinder/+/764276

Re-open this patch if desired once the newer branches are sorted out.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/rocky)

Change abandoned by "Eric Harney <email address hidden>" on branch: stable/rocky
Review: https://review.opendev.org/c/openstack/cinder/+/764277
Reason: This has been here for a while but the Stein backport is not passing unit test jobs yet: https://review.opendev.org/c/openstack/cinder/+/764276

Re-open this patch if desired once the newer branches are sorted out.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/stein)

Change abandoned by "Brian Rosmaita <email address hidden>" on branch: stable/stein
Review: https://review.opendev.org/c/openstack/cinder/+/764276
Reason: Stein transitioned to End of Life by change Icf9a539a7b8b and is accepting no more changes.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.