race condition in quick detach/attach to the same volume and vm
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Expired
|
Undecided
|
Unassigned |
Bug Description
tested on Juno with Cell enabled.
The race condition happens as follows:
1. send a detach request to an existing VM with a volume;
2. send an attach request to attach the same volume to the same VM immediately after #1 in another process.
Expected result:
a. #2 get refused due to #1 is in progress, or
b. #2 finishes after #1 finished.
However race may happen with following sequences:
Req #1 finished physical action of detach >>
Req #1 finished cinder call (setting volume to available) >>
Req #2 came into Nova API and got through the call flow since volume is available now >>
Req #2 ran faster then Req #1 and updated Nova DB BDMs with volume info >>
Req #2 finished and removed the existing volume info in BDMs >>
now cinder volume status and nova bdm states went mismatched. The volume became inoperable of either attaching or detaching that both operations will be refused.
Also in our test case, child cell nova db and parent cell nova db went mismatched since Req #2 passed Req#1 when Req#1 is call updating from child cell to parent cell.
This issue is caused by no guard check against nova bdm table in attach process. The suggested fix is to add a volume id check against nova bdm table in the beginning of the request to guarantee so that for 1 single volume/instance pair, no parallel modification will happen.
The attachment is a slice of logs show the message disorder triggered in the test case
tags: | added: volumes |
Changed in nova: | |
assignee: | nobody → Chung Chih, Hung (lyanchih) |
Changed in nova: | |
assignee: | lyanchih (lyanchih) → nobody |
@Chung Chih, Hung (lyanchih) :
Since you are set as assignee, I switch the status to "In Progress".