OpenStack Compute (nova)

Multipath device descriptor and iscsi device not deleted when detaching multiple volumes at the same time at the same host

Bug #1336683 reported by Nikolas Hermanns on 2014-07-02

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

For every detached volume the nova-compute instance deletes all multipath devices from the host while detaching. This is working fine if the volumes are detached each by each, but not for multiple detachment.
Nova-compute always rescans the iscsi devices before it detaches them. The rescan reconnects all missing devices if the Volume is still available on the VNX. In the time between the iscsi devices are deleted and cinder-volume is actually detaching the volume on the vnx site, the iscsi connections will be always reconnected by any rescan. All detachments run into one thread on the host, so the detachment is done one by one. However the detachment on the cinder-volume site is done through multithreading and the nova-compute is not waiting between the detachments for cinder-volume to detach. So it can happen that the device is deleted and the next detachment of the next volume brings back the devices.
The next time a volume is attached to this VM on this host an error will occur. Or if this is working the the volume will stay in use until the VM is terminated. It is not possible to detach the volume again.
Also the information of the volume is not updated for the second detachment. The size stays the same also the new attachment is totally different.

3600601604781340063b2b4293601e411 dm-2 DGC ,VRAID
size=4.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=2 status=enabled
| |- 7:0:0:3 sdr 65:16 failed ready running
| `- 6:0:0:3 sdq 65:0 failed ready running
`-+- policy='round-robin 0' prio=1 status=enabled
`- 9:0:0:3 sds 65:32 failed ready running

Tags:

Revision history for this message

Nikolas Hermanns (nikolas-hermanns) wrote on 2014-07-02:

Influenced functions:
def _detach_volume(self, context, instance, bdm): in nova/compute/manager.py
def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns): in nova/virtlibvirt/volume.py
Used backend: EMC VNX 5400
OS: Windriver/Ubuntu

Revision history for this message

Zoltan Arnold Nagy (zoltan) wrote on 2014-09-10:

I've just run into this on Ubuntu 14.04 + Icehouse 2nd point release with the Storwize backend for cinder.

Revision history for this message

Zoltan Arnold Nagy (zoltan) wrote on 2014-09-10:

It's even worse in some case, as I have the iSCSI mapped block devices lingering around even tho they don't exist anymore on the store thus anything touching them will hang.

Changed in nova:
status:	New → Confirmed

Nikolas Hermanns (nikolas-hermanns) on 2015-03-25

Changed in nova:
assignee:	nobody → Nikolas Hermanns (nikolas-hermanns)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-10: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/172341

Changed in nova:
status:	Confirmed → In Progress

OpenStack Infra (hudson-openstack) on 2015-07-10

Changed in nova:
assignee:	Nikolas Hermanns (nikolas-hermanns) → Pavel Kholkin (pkholkin)

OpenStack Infra (hudson-openstack) on 2015-07-29

Changed in nova:
assignee:	Pavel Kholkin (pkholkin) → Nikolas Hermanns (nikolas-hermanns)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-20: Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/172341
Reason: This patch is very old and appears to not be active any more. I am therefore abandoning it to keep the nova review queue sane. Feel free to restore the change when you're actively working on it again.

Matt Riedemann (mriedem) on 2016-01-22

tags:

added: multipath

Davanum Srinivas (DIMS) (dims-v) on 2016-03-07

Changed in nova:
assignee:	Nikolas Hermanns (nikolas-hermanns) → nobody
status:	In Progress → Confirmed

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-07-05: Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
status:	Confirmed → Expired

Lee Yarwood (lyarwood) on 2016-07-23

Changed in nova:
status:	Expired → Confirmed
assignee:	nobody → Lee Yarwood (lyarwood)

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2016-07-23:

Reopening after seeing this downstream against Kilo / OSP 7. This still looks possible in master so I'll reproduce early next week using the LVM/iSCSI volume backend with a sleep in terminate_connection to fake a slow volume backend such as the VNX.

Revision history for this message

Maciej Szankin (mszankin) wrote on 2016-08-16:

Lee Yarwood, were you able to reproduce it?

Maciej Szankin (mszankin) on 2016-08-17

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2016-08-17:

I've been unable to reproduce with or without the artificial slow down of the terminate_connection calls. That said we continue to see this against a customers VNX backend, using dm-multipath and queue_if_no_path enabled. I'll continue trying to reproduce so please leave this as `In Progress` for now.

Lee Yarwood (lyarwood) on 2017-01-12

Changed in nova:
status:	In Progress → Incomplete
assignee:	Lee Yarwood (lyarwood) → nobody

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-03-14:

#10

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.