Cinder

Race between cinder and nova allows multiple detaches

Bug #1238093 reported by Duncan Thomas on 2013-10-10

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	High	Gorka Eguileor

Bug Description

There's a race during detach which can cause a volume to be left in an invalid status if multiple detaches are issued on the same volume.

A diagram of the race can be found at http://paste.openstack.org/show/48223/ since launchpad can't handle the formatting.

The fix I think is that begin_detaching needs to atomically check that the status is [in-use|detaching] and set it to detaching. Then detach() needs to atomically check for detaching and set to in-use, both cases returning exceptions as relevant.

See original description

Duncan Thomas (duncan-thomas) on 2013-10-10

description:	updated
description:	updated

Revision history for this message

Avishay Traeger (avishay-il) wrote on 2013-10-13:

I think this is a specific case of a more general problem - that's what happens when check-and-set isn't atomic. I thought about this a little while back but unfortunately got sidetracked and didn't follow up. This could either be done at the DB level or by grabbing a lock.

Changed in cinder:
importance:	Undecided → High
status:	New → Confirmed

Lee Li (lilinguo) on 2013-12-25

Changed in cinder:
assignee:	nobody → Lee Li (lilinguo)

Lee Li (lilinguo) on 2014-01-06

Changed in cinder:
assignee:	Lee Li (lilinguo) → nobody

Ryan McNair (rdmcnair) on 2014-06-23

Changed in cinder:
assignee:	nobody → Ryan McNair (rdmcnair)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-23: Fix proposed to cinder (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/101797

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-23: Change abandoned on cinder (stable/icehouse)

Change abandoned by Ryan McNair (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/101797
Reason: A fix must be made for master instead of icehouse

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-26: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/102705

Changed in cinder:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-03: Change abandoned on cinder (master)

Change abandoned by Ryan McNair (<email address hidden>) on branch: master
Review: https://review.openstack.org/102705
Reason: Abondoning until a more complete global locking mechanism (https://review.openstack.org/#/c/95037/) lands.

Revision history for this message

Walt Boring (walter-boring) wrote on 2015-02-06:

What needs to happen to prevent this race is a select for update on the DB entry

OpenStack Infra (hudson-openstack) on 2015-09-01

Changed in cinder:
assignee:	Ryan McNair (rdmcnair) → Gorka Eguileor (gorka)

Revision history for this message

Zhen Zhen Jiang (zzjiang) wrote on 2015-10-27:

Hi, Gorka, could you kindly tell me when the fix for this issue would be available and your progress on it? Thank you.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-22: Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/218012
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=938cb0c5ab11638a07c22e9db2b0d241223d9d9f
Submitter: Jenkins
Branch: master

commit 938cb0c5ab11638a07c22e9db2b0d241223d9d9f
Author: Gorka Eguileor <email address hidden>
Date: Fri Aug 28 01:14:02 2015 +0200

Move get_by_id to CinderObject

    Currently each Versioned Object needs to implement its own get_by_id,
    with this patch they don't need anymore, since it will be included in
    the base class CinderObject and it will work for all the objects.

This will help for other things like having a refresh method or
conditional updates in the objects.

    Related-Bug: #1490944
    Related-Bug: #1238093
    Related-Bug: #1490946
    Related-Bug: #1469659
    Change-Id: I355dc8eaefed93003533ee083f74acd1315f057e

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-26: Change abandoned on cinder (master)

Change abandoned by Gorka Eguileor (<email address hidden>) on branch: master
Review: https://review.openstack.org/218936

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-30: Related fix merged to cinder (master)

#10

Download full text (3.6 KiB)

Reviewed: https://review.openstack.org/205834
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=1070c28774a74f3e482cb21dbd98f031010ec4a5
Submitter: Jenkins
Branch: master

commit 1070c28774a74f3e482cb21dbd98f031010ec4a5
Author: Gorka Eguileor <email address hidden>
Date: Sat Jul 25 12:34:28 2015 +0200

Add atomic conditional updates to objects

To allow atomic state changes across Cinder services we need to
implement a way to easily do compare-and-swap.

This patch adds methods to allow compare-and-swap on DB models and on
Cinder Versioned Objects as well.

    Conditions for the compare part of the update can consist of:
    - Inclusion: status == 'available'
    - Exclusion: status != 'in-use'
    - Multi-inclusion: status in ('available', 'error')
    - Multi-exclusion: status not in ('attaching', 'in-use')
    - Sqlalchemy filters

    A complete example of usage would be the compare-and-swap used in volume
    delete requests that has to take in consideration not only the status
    but the attach and migration status as well as the volume not having
    snapshots:

     now = timeutils.utcnow()
     expected = {'attach_status': db.Not('attached'),
                 'migration_status': None,
                 'consistencygroup_id': None}
     good_status = ('available', 'error', 'error_restoring',
                    'error_extending')
     if not force:
         expected.update(status=good_status)

     # Volume cannot have snapshots if we want to delete it
     filters = [~sql.exists().where(models.Volume.id ==
                                    models.Snapshot.volume_id)]

     updated = vol_obj.conditional_update(
         {'status': 'deleting',
          'previous_status': vol_obj.model.status,
          'terminated_at': now},
         expected,
         filters)

    It can also be specified whether to save already dirtied fields from the
    objects or not and by default (if no expected_values argument is
    provided) it will make sure that the entry in the DB has the same values
    as the objects we are saving.

We can select values based on conditions using Case objects in the
'values' argument. For example:

     has_snapshot_filter = sql.exists().where(
         models.Snapshot.volume_id == models.Volume.id)
     case_values = volume.Case([(has_snapshot_filter, 'has-snapshot')],
                               else_='no-snapshot')
     volume.conditional_update({'status': case_values},
                               {'status': 'available'})

    Exclusion and multi-exclusion will handle, by default, NULL values like
    Python does instead of like SQL does, so NULL values will be considered
    different than any non NULL values. That way if we search for something
    not equal to 1 we will also get NULL values.

    WARNING: SQLAlchemy does not allow selecting order of SET clauses, so
    for now we cannot do things like
        {'previous_status': model.status, 'status': 'retyping'}
    because it will result in both previous_status and status being set to
    'retyping'. Issue has been reported [1] and a patch to fix it [...

Reviewed:  https://review.openstack.org/205834
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=1070c28774a74f3e482cb21dbd98f031010ec4a5
Submitter: Jenkins
Branch:    master

commit 1070c28774a74f3e482cb21dbd98f031010ec4a5
Author: Gorka Eguileor <geguileo@redhat.com>
Date:   Sat Jul 25 12:34:28 2015 +0200

Add atomic conditional updates to objects
    
    To allow atomic state changes across Cinder services we need to
    implement a way to easily do compare-and-swap.
    
    This patch adds methods to allow compare-and-swap on DB models and on
    Cinder Versioned Objects as well.
    
    Conditions for the compare part of the update can consist of:
    - Inclusion: status == 'available'
    - Exclusion: status != 'in-use'
    - Multi-inclusion: status in ('available', 'error')
    - Multi-exclusion: status not in ('attaching', 'in-use')
    - Sqlalchemy filters
    
    A complete example of usage would be the compare-and-swap used in volume
    delete requests that has to take in consideration not only the status
    but the attach and migration status as well as the volume not having
    snapshots:
    
     now = timeutils.utcnow()
     expected = {'attach_status': db.Not('attached'),
                 'migration_status': None,
                 'consistencygroup_id': None}
     good_status = ('available', 'error', 'error_restoring',
                    'error_extending')
     if not force:
         expected.update(status=good_status)
    
     # Volume cannot have snapshots if we want to delete it
     filters = [~sql.exists().where(models.Volume.id ==
                                    models.Snapshot.volume_id)]
    
     updated = vol_obj.conditional_update(
         {'status': 'deleting',
          'previous_status': vol_obj.model.status,
          'terminated_at': now},
         expected,
         filters)
    
    It can also be specified whether to save already dirtied fields from the
    objects or not and by default (if no expected_values argument is
    provided) it will make sure that the entry in the DB has the same values
    as the objects we are saving.
    
    We can select values based on conditions using Case objects in the
    'values' argument. For example:
    
     has_snapshot_filter = sql.exists().where(
         models.Snapshot.volume_id == models.Volume.id)
     case_values = volume.Case([(has_snapshot_filter, 'has-snapshot')],
                               else_='no-snapshot')
     volume.conditional_update({'status': case_values},
                               {'status': 'available'})
    
    Exclusion and multi-exclusion will handle, by default, NULL values like
    Python does instead of like SQL does, so NULL values will be considered
    different than any non NULL values.  That way if we search for something
    not equal to 1 we will also get NULL values.
    
    WARNING: SQLAlchemy does not allow selecting order of SET clauses, so
    for now we cannot do things like
        {'previous_status': model.status, 'status': 'retyping'}
    because it will result in both previous_status and status being set to
    'retyping'.  Issue has been reported [1] and a patch to fix it [2] has
    been submitted.
    [1]: https://bitbucket.org/zzzeek/sqlalchemy/issues/3541
    [2]: https://github.com/zzzeek/sqlalchemy/pull/200
    
    Specs: https://review.openstack.org/232599/
    
    Implements: blueprint cinder-volume-active-active-support
    Related-Bug: #1490944
    Related-Bug: #1238093
    Related-Bug: #1490946
    Related-Bug: #1469659
    Related-Bug: #1493120
    Related-Bug: #1493419
    Related-Bug: #1493476
    Related-Bug: #1494466
    Change-Id: If90a37f8c7d6fad8fc1f861d52ba862875920cdc

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-30: Fix merged to cinder (master)

#11

Reviewed: https://review.openstack.org/216376
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=9bcf96230e4fd6d7998782c0c62cc8188df518ac
Submitter: Jenkins
Branch: master

commit 9bcf96230e4fd6d7998782c0c62cc8188df518ac
Author: Gorka Eguileor <email address hidden>
Date: Fri Aug 21 19:06:07 2015 +0200

Improve metadata update operations

Currently our metadata operations in the DB and the API are less than
optimal, main issues are:

    - To update metadata first we get all metadata in the API and add
      requested update metadata, then on the DB we retrieve each of the
      metadata keys and update one by one contents.
    - When deletion of existing keys is requested in the DB we retrieve all
      metadata contents and for those that are not in the new metadata we
      retrieve the contents from the DB (again) and delete the entry.

    This patch changes this so we no longer retrieve metadata in the API,
    deletion does not retrieve any metadata from the DB and just makes 1
    deletion request to the DB, and for changes we retrieve metadata once
    and then issue a bulk change to create new entries and update existing
    ones.

Partial-Bug: #1238093
Change-Id: I12506541cca61282122e319e3504560f68df225b

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-23:

#12

Reviewed: https://review.openstack.org/216377
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c7bfd636b32cfeebeb45135a6aedcd8e9214ca11
Submitter: Jenkins
Branch: master

commit c7bfd636b32cfeebeb45135a6aedcd8e9214ca11
Author: Gorka Eguileor <email address hidden>
Date: Fri Aug 21 19:13:50 2015 +0200

Remove API races from attach and detach methods

This patch uses compare-and-swap to perform atomic DB changes to remove
API races in attach and detach related methods.

    Races are removes from the following methods:
    - attach
    - roll_detaching
    - begin_detaching
    - unreserve_volume
    - reserve_volume

Specs: https://review.openstack.org/232599/

    Implements: blueprint cinder-volume-active-active-support
    Closes-Bug: #1238093
    Change-Id: I646d946209e0921d1056ee89e59b82b814bf9c15

Changed in cinder:
status:	In Progress → Fix Released

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-01-21: Fix included in openstack/cinder 8.0.0.0b2

#13

This issue was fixed in the openstack/cinder 8.0.0.0b2 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.