Device for other volume is deleted unexpected during volume detach when iscsi multipath is used

Bug #1454512 reported by Tina Tang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Unassigned

Bug Description

We found this issue during testing volume detachment when iSCSI multipath is used. When a same iSCSI protal and iqn is shared by multiple LUNs, device from other volume maybe be deleted unexpected. This is found both in Kilo and the latest code.

For example, the devices under /dev/disk/by-path may looks like below when LUN 23 and 231 are from a same storage system and a same iSCSI protal and iqn are used. ls /dev/disk/by-path
ip-192.168.3.50:3260-iscsi-<iqna>-lun-23
ip-192.168.3.50:3260-iscsi-<iqna>-lun-231
ip-192.168.3.51:3260-iscsi-<iqnb>-lun-23
ip-192.168.3.51:3260-iscsi-<iqnb>-lun-231

When we try to detach volume corresponding LUN 23 from the host, we noticed that the devices regarding to LUN 231 are also deleted which may cause the data unavailable.

Why this happen? After digging into the nova code, below is the clue:

nova/virt/libvirt/volume.py
770 def _delete_mpath(self, iscsi_properties, multipath_device, ips_iqns):
771 entries = self._get_iscsi_devices()
772 # Loop through ips_iqns to construct all paths
773 iqn_luns = []
774 for ip, iqn in ips_iqns:
775 iqn_lun = '%s-lun-%s' % (iqn,
776 iscsi_properties.get('target_lun', 0))
777 iqn_luns.append(iqn_lun)
778 for dev in ['/dev/disk/by-path/%s' % dev for dev in entries]:
779 for iqn_lun in iqn_luns:
780 if iqn_lun in dev: ==> This is incorrect, device for LUN 231 will made this be True.
781 self._delete_device(dev)
782
783 self._rescan_multipath()

Due to the incorrect logic in line 780, detach LUN xx will deleted devices for other LUNs starts with xx, such as xxy, xxz. We could use dev.endswith(iqn_lun) to avoid it.
===================================
stack@openstack-performance:~/tina/nova_iscsi_mp/nova$ git log -1
commit f4504f3575b35ec14390b4b678e441fcf953f47b
Merge: 3f21f60 5fbd852
Author: Jenkins <email address hidden>
Date: Tue May 12 22:46:43 2015 +0000

    Merge "Remove db layer hard-code permission checks for network_get_all_by_host"

Tags: volumes
Revision history for this message
Tina Tang (tina-tang) wrote :

This can be fixed easily, assign it to myself.

description: updated
description: updated
Changed in nova:
assignee: nobody → Tina Tang (tina-tang)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/182565

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by melanie witt (<email address hidden>) on branch: master
Review: https://review.openstack.org/182565
Reason: Tina, thanks for the update. I'll abandon this patch for you. FYI you can abandon patches on which you are owner by clicking the "Abandon" button

Revision history for this message
melanie witt (melwitt) wrote :

This is apparently fixed by the switch to os-brick:

https://review.openstack.org/#/c/175569/

Changed in nova:
assignee: Tina Tang (tina-tang) → nobody
importance: Undecided → Medium
status: In Progress → Fix Committed
tags: added: volumes
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-3 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.