OpenStack Compute (nova)

Race condition when deleting iscsi devices

Bug #1297635 reported by Sam Morrison on 2014-03-26

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

If you have two instances on the same compute node that each have a volume attached (using iscsi backend)

If you delete both of them triggering a disconnect volume the following happens:

First request will delete the device
echo 1> /sys/block/sdr/device/delete

The second request triggers an iscsi_rescan which then rediscovers the device.

The volume is then deleted from the backend cinder.

now you have a device which is pointing back to a deleted volume.

This is using an NetApp device where all the devices are in the same IQN and using multipath on stable/havana

Tags:

Tracy Jones (tjones-i) on 2014-03-26

tags:

added: volumes

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-03-27:

If I understand correctly the deleted part happens only if volume is set to "delete_on_termination". Otherwise - yes, this seems like something we want to serialize in the libvirt iscsi volume driver.

Changed in nova:
importance:	Undecided → High
importance:	High → Medium

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-03-27:

Hmmm - I haven't tried to reproduce yet - so will leave the bug on "New" for now - but just by looking at the code, I can't figure out where the rescan happens.

Revision history for this message

Sam Morrison (sorrison) wrote on 2014-03-28:

The rescan happens when the next volume is deleted, it happens too fast as the first volume hasn't been deleted by cinder yet and so the targe is still discoverable
.

Revision history for this message

Sam Morrison (sorrison) wrote on 2014-04-02:

OK I've just worked out that this is only a problem when using multipath

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-04-04:

Yep - after looking at the code - it does seem that there is a race when using multipath. A likely fix is to make an instance wide mutex on libvirt volume detach.

Changed in nova:
status:	New → Triaged

Revision history for this message

Ihor Kaharlichenko (madkinder) wrote on 2014-06-05:

The same problem happens with fibre channel connected devices that use multipath.

Revision history for this message

Sam Morrison (sorrison) wrote on 2014-06-05:

Great to know others have this issue! This is a serious issue for us as it's causing volume to get into really bad states and the only way to fix is to reboot the compute node

Ilja Livenson (ilja-livenson) on 2015-03-15

tags:

added: multipath

Sean Dague (sdague) on 2015-03-30

Changed in nova:
status:	Triaged → Confirmed

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-09-18:

Is this still an issue in liberty? Otherwise see comment 6 in bug 1492026 - in mitaka I'd like to add some event callback code to the libvirt driver such that we can make the volume device attach/detach synchronous before we call off to cinder/os-brick to do the iscsi connect/disconnect volume work.

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-07-05: Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance:	Medium → Undecided
status:	Confirmed → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.