Multipath device descriptor and iscsi device not deleted when detaching multiple volumes at the same time at the same host

Bug #1336683 reported by Nikolas Hermanns
48
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

For every detached volume the nova-compute instance deletes all multipath devices from the host while detaching. This is working fine if the volumes are detached each by each, but not for multiple detachment.
Nova-compute always rescans the iscsi devices before it detaches them. The rescan reconnects all missing devices if the Volume is still available on the VNX. In the time between the iscsi devices are deleted and cinder-volume is actually detaching the volume on the vnx site, the iscsi connections will be always reconnected by any rescan. All detachments run into one thread on the host, so the detachment is done one by one. However the detachment on the cinder-volume site is done through multithreading and the nova-compute is not waiting between the detachments for cinder-volume to detach. So it can happen that the device is deleted and the next detachment of the next volume brings back the devices.
The next time a volume is attached to this VM on this host an error will occur. Or if this is working the the volume will stay in use until the VM is terminated. It is not possible to detach the volume again.
Also the information of the volume is not updated for the second detachment. The size stays the same also the new attachment is totally different.

3600601604781340063b2b4293601e411 dm-2 DGC ,VRAID
size=4.0G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=2 status=enabled
| |- 7:0:0:3 sdr 65:16 failed ready running
| `- 6:0:0:3 sdq 65:0 failed ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 9:0:0:3 sds 65:32 failed ready running

Revision history for this message
Nikolas Hermanns (nikolas-hermanns) wrote :

Influenced functions:
def _detach_volume(self, context, instance, bdm): in nova/compute/manager.py
def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns): in nova/virtlibvirt/volume.py
Used backend: EMC VNX 5400
OS: Windriver/Ubuntu

Revision history for this message
Zoltan Arnold Nagy (zoltan) wrote :

I've just run into this on Ubuntu 14.04 + Icehouse 2nd point release with the Storwize backend for cinder.

Revision history for this message
Zoltan Arnold Nagy (zoltan) wrote :

It's even worse in some case, as I have the iSCSI mapped block devices lingering around even tho they don't exist anymore on the store thus anything touching them will hang.

Changed in nova:
status: New → Confirmed
Changed in nova:
assignee: nobody → Nikolas Hermanns (nikolas-hermanns)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/172341

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: Nikolas Hermanns (nikolas-hermanns) → Pavel Kholkin (pkholkin)
Changed in nova:
assignee: Pavel Kholkin (pkholkin) → Nikolas Hermanns (nikolas-hermanns)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/172341
Reason: This patch is very old and appears to not be active any more. I am therefore abandoning it to keep the nova review queue sane. Feel free to restore the change when you're actively working on it again.

Matt Riedemann (mriedem)
tags: added: multipath
Changed in nova:
assignee: Nikolas Hermanns (nikolas-hermanns) → nobody
status: In Progress → Confirmed
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
status: Confirmed → Expired
Lee Yarwood (lyarwood)
Changed in nova:
status: Expired → Confirmed
assignee: nobody → Lee Yarwood (lyarwood)
Revision history for this message
Lee Yarwood (lyarwood) wrote :

Reopening after seeing this downstream against Kilo / OSP 7. This still looks possible in master so I'll reproduce early next week using the LVM/iSCSI volume backend with a sleep in terminate_connection to fake a slow volume backend such as the VNX.

Revision history for this message
Maciej Szankin (mszankin) wrote :

Lee Yarwood, were you able to reproduce it?

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Lee Yarwood (lyarwood) wrote :

I've been unable to reproduce with or without the artificial slow down of the terminate_connection calls. That said we continue to see this against a customers VNX backend, using dm-multipath and queue_if_no_path enabled. I'll continue trying to reproduce so please leave this as `In Progress` for now.

Lee Yarwood (lyarwood)
Changed in nova:
status: In Progress → Incomplete
assignee: Lee Yarwood (lyarwood) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.