PowerMax driver may deadlock moving volumes between SGs

Bug #1980870 reported by Gorka Eguileor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Low
Gorka Eguileor

Bug Description

There's a potential deadlock scenario in PowerMax's masking.py move_volume_between_storage_groups" method.

The method uses 2 locks, one for the source Storage Group and another for the destination Storage Group, and it could happen that if 2 requests going in opposite directions are received simultaneously their first lock acquisition interleaves resulting in a deadlock situation.

    @coordination.synchronized(
        "emc-sg-{source_storagegroup_name}-{serial_number}")
    @coordination.synchronized(
        "emc-sg-{target_storagegroup_name}-{serial_number}")
    def move_volume_between_storage_groups(
            self, serial_number, device_id, source_storagegroup_name,
            target_storagegroup_name, extra_specs, force=False,
            parent_sg=None):

The scenario would be like this:

- User requests an instance migration from A to B
- User requests an instance migration from B to A
- Driver acquires the first lock for A-to-B which is something like cinder-emc-sg-SGA-###
- Driver acquires the first lock for B-to-A which is something like cinder-emc-sgSGB-###

The deadlock happens because A-to-B waits forever for the lock held by the B-to-A operation, which in turn cannot proceed because it’s waiting for lock help by A-to-B.

Changed in cinder:
status: New → In Progress
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Changed in cinder:
importance: Undecided → Low
tags: added: deadlock powermax
Eric Harney (eharney)
tags: added: drivers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/848900
Committed: https://opendev.org/openstack/cinder/commit/411852892d803b606110d0956a59764925c16ec6
Submitter: "Zuul (22348)"
Branch: master

commit 411852892d803b606110d0956a59764925c16ec6
Author: Gorka Eguileor <email address hidden>
Date: Wed Jul 6 20:51:34 2022 +0200

    PowerMax: Fix deadlock moving SGs

    There's a potential deadlock scenario in PowerMax's masking.py
    "do_move_volume_between_storage_groups" method.

    The method uses 2 locks, one for the source Storage Group and another
    for the destination Storage Group, and it could happen that if 2
    requests going in opposite directions are received simultaneously their
    first lock acquisition interleaves resulting in a deadlock situation.

        @coordination.synchronized(
            "emc-sg-{source_storagegroup_name}-{serial_number}")
        @coordination.synchronized(
            "emc-sg-{target_storagegroup_name}-{serial_number}")
        def do_move_volume_between_storage_groups(
            serial_number, source_storage_group_name,
            target_storage_group_name):

    The scenario would be like this:

    - User requests an instance migration from A to B
    - User requests an instance migration from B to A
    - Driver acquires the first lock for A-to-B for example something like
      cinder-emc-sg-SGA-###
    - Driver acquires the first lock for B-to-A for example something like
      cinder-emc-sgSGB-###

    The deadlock happens because A-to-B waits forever for the lock held by
    the B-to-A operation, which in turn cannot proceed because it’s waiting
    for lock held by A-to-B.

    This patch fixes it using the new coordination.synchronized
    functionality that ensures that a series of locks are always acquired in
    the same order, preventing deadlocks.

    Closes-Bug: #1980870
    Change-Id: I7eda4645575cfaedcf45d73ab3a215976d3fac3a

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 24.0.0.0rc1

This issue was fixed in the openstack/cinder 24.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.