race condition in quick detach/attach to the same volume and vm

Bug #1457359 reported by Oscar Huang
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

tested on Juno with Cell enabled.

The race condition happens as follows:
1. send a detach request to an existing VM with a volume;
2. send an attach request to attach the same volume to the same VM immediately after #1 in another process.

Expected result:
a. #2 get refused due to #1 is in progress, or
b. #2 finishes after #1 finished.

However race may happen with following sequences:

 Req #1 finished physical action of detach >>
 Req #1 finished cinder call (setting volume to available) >>
 Req #2 came into Nova API and got through the call flow since volume is available now >>
 Req #2 ran faster then Req #1 and updated Nova DB BDMs with volume info >>
 Req #2 finished and removed the existing volume info in BDMs >>
 now cinder volume status and nova bdm states went mismatched. The volume became inoperable of either attaching or detaching that both operations will be refused.

Also in our test case, child cell nova db and parent cell nova db went mismatched since Req #2 passed Req#1 when Req#1 is call updating from child cell to parent cell.

This issue is caused by no guard check against nova bdm table in attach process. The suggested fix is to add a volume id check against nova bdm table in the beginning of the request to guarantee so that for 1 single volume/instance pair, no parallel modification will happen.

The attachment is a slice of logs show the message disorder triggered in the test case

Tags: volumes
Revision history for this message
Oscar Huang (huangxiwei) wrote :
tags: added: volumes
Changed in nova:
assignee: nobody → Chung Chih, Hung (lyanchih)
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Chung Chih, Hung (lyanchih) :

Since you are set as assignee, I switch the status to "In Progress".

Changed in nova:
status: New → In Progress
Revision history for this message
Chung Chih, Hung (lyanchih) wrote :

This bug looks like it was fixed at following review
https://review.openstack.org/#/c/88416/
Please offer your nova branch sum

Changed in nova:
status: In Progress → Incomplete
Changed in nova:
assignee: lyanchih (lyanchih) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.