Accept Transfer race condition

Bug #1357432 reported by Huang Zhiteng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Huang Zhiteng
Icehouse
Fix Released
Undecided
Unassigned

Bug Description

In accept_transfer() workflow (https://github.com/openstack/cinder/blob/master/cinder/transfer/api.py#L193)

         try:
             # Transfer ownership of the volume now, must use an elevated
             # context.
             self.volume_api.accept_transfer(context,
                                             vol_ref,
                                             context.user_id,
                                             context.project_id)
             self.db.transfer_accept(context.elevated(),
                                     transfer_id,
                                     context.user_id,
                                     context.project_id)

   self.volume_api.accept_transfer() sends out a RPC request (cast) to volume manager and volume driver may do something in backend (e.g. modify account for the volume), but since it's a unblocking RPC cast, this call returns pretty fast and the volume record in DB will be updated. There can be cases where DB update finishes even before volume manager / driver does their job. Unfortunately some driver(s) relies on original DB record to do their stuff (SolidFire for example). Such situation will turn volume into inconsistent state (between backend and cinder DB) and become unusable.

   We may either change accept_transfer() in volume RPC API from CAST to blocking CALL, or we can pass enough original volume state to volume manger so that they don't have to rely on unreliable DB state.

Changed in cinder:
importance: Undecided → Medium
status: New → Triaged
Changed in cinder:
assignee: nobody → Huang Zhiteng (zhiteng-huang)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/119635

Changed in cinder:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/119635
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=7e95b05b937e632ae8aee82cfa8f6f31835e6316
Submitter: Jenkins
Branch: master

commit 7e95b05b937e632ae8aee82cfa8f6f31835e6316
Author: Zhiteng Huang <email address hidden>
Date: Sun Sep 7 10:22:52 2014 -0700

    Fix possible race condition for accept transfer

    Accept transfer API workflow is currently like this:

      call volume_api.accept_transfer()
        |
        --- RPC cast to volume manager
              |
              --- volume manager calls volume driver accept_transfer()

      update volume's DB record

    Given the non-blocking nature of RPC cast, what happens in volume
    manager and volume driver can happen in parallel with the DB update.
    If volume driver relies on original DB record to do things, then
    DB record shouldn't be updated until volume driver finishes its job.

    So this patch change volume RPC API accept_transfer() from cast
    to call to make sure the workflow is in serialized manner. Also
    elevated the context when volume manager tries to update the DB
    record when driver has done accept_transfer().

    Change-Id: Ieae52e167aa02967338e0be5d78d570d682faa7a
    Closes-bug: #1357432

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/127147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/icehouse)

Reviewed: https://review.openstack.org/127147
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=ffc886b55b322c7ad55270bb40ee87c02426f3c0
Submitter: Jenkins
Branch: stable/icehouse

commit ffc886b55b322c7ad55270bb40ee87c02426f3c0
Author: Zhiteng Huang <email address hidden>
Date: Sun Sep 7 10:22:52 2014 -0700

    Fix possible race condition for accept transfer

    Accept transfer API workflow is currently like this:

      call volume_api.accept_transfer()
        |
        --- RPC cast to volume manager
              |
              --- volume manager calls volume driver accept_transfer()

      update volume's DB record

    Given the non-blocking nature of RPC cast, what happens in volume
    manager and volume driver can happen in parallel with the DB update.
    If volume driver relies on original DB record to do things, then
    DB record shouldn't be updated until volume driver finishes its job.

    So this patch change volume RPC API accept_transfer() from cast
    to call to make sure the workflow is in serialized manner. Also
    elevated the context when volume manager tries to update the DB
    record when driver has done accept_transfer().

    Closes-bug: #1357432

    Conflicts:
        cinder/volume/rpcapi.py

    Change-Id: Ieae52e167aa02967338e0be5d78d570d682faa7a
    (cherry picked from commit 7e95b05b937e632ae8aee82cfa8f6f31835e6316)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.