OpenStack Compute (nova)

race condition in quick detach/attach to the same volume and vm

Bug #1457359 reported by Oscar Huang on 2015-05-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

tested on Juno with Cell enabled.

The race condition happens as follows:
1. send a detach request to an existing VM with a volume;
2. send an attach request to attach the same volume to the same VM immediately after #1 in another process.

Expected result:
a. #2 get refused due to #1 is in progress, or
b. #2 finishes after #1 finished.

However race may happen with following sequences:

Req #1 finished physical action of detach >>
Req #1 finished cinder call (setting volume to available) >>
Req #2 came into Nova API and got through the call flow since volume is available now >>
Req #2 ran faster then Req #1 and updated Nova DB BDMs with volume info >>
Req #2 finished and removed the existing volume info in BDMs >>
now cinder volume status and nova bdm states went mismatched. The volume became inoperable of either attaching or detaching that both operations will be refused.

Also in our test case, child cell nova db and parent cell nova db went mismatched since Req #2 passed Req#1 when Req#1 is call updating from child cell to parent cell.

This issue is caused by no guard check against nova bdm table in attach process. The suggested fix is to add a volume id check against nova bdm table in the beginning of the request to guarantee so that for 1 single volume/instance pair, no parallel modification will happen.

The attachment is a slice of logs show the message disorder triggered in the test case

Tags:

Revision history for this message

Oscar Huang (huangxiwei) wrote on 2015-05-21:

the message disorder triggered in the test case Edit (2.1 KiB, text/plain)

Markus Zoeller (markus_z) (mzoeller) on 2015-06-02

tags:

added: volumes

Chung Chih, Hung (lyanchih) on 2015-06-23

Changed in nova:
assignee:	nobody → Chung Chih, Hung (lyanchih)

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2015-06-24:

@Chung Chih, Hung (lyanchih) :

Since you are set as assignee, I switch the status to "In Progress".

Changed in nova:
status:	New → In Progress

Revision history for this message

Chung Chih, Hung (lyanchih) wrote on 2015-06-29:

This bug looks like it was fixed at following review
https://review.openstack.org/#/c/88416/
Please offer your nova branch sum

Changed in nova:
status:	In Progress → Incomplete

Chung Chih, Hung (lyanchih) on 2015-07-20

Changed in nova:
assignee:	lyanchih (lyanchih) → nobody

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-09-18:

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

the message disorder triggered in the test case Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.