During volume detachment, scsi devices are not removed correctly.

Bug #1636592 reported by Matthew Heler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Released
Undecided
Unassigned

Bug Description

During a volume detachment, a race condition exists where os-brick will not delete the scsi devices correctly. The scsi devices remain logged in, and show up as faulty devices.

This is on Liberty, with an XtremIo backend.

root@rax-rpc-1-compute003:~# multipath -ll
3514f0c5dbd8000d8 dm-9 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
  |- 31:0:0:8 sdab 65:176 failed faulty running
  |- 32:0:0:8 sdac 65:192 failed faulty running
  |- 33:0:0:8 sdaa 65:160 failed faulty running
  `- 34:0:0:8 sdz 65:144 failed faulty running
3514f0c5dbd8000ba dm-10 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
  |- 31:0:0:9 sdaf 65:240 failed faulty running
  |- 34:0:0:9 sdad 65:208 failed faulty running
  |- 33:0:0:9 sdae 65:224 failed faulty running
  `- 32:0:0:9 sdag 66:0 failed faulty running
3514f0c5dbd8000d7 dm-6 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
  |- 31:0:0:6 sdt 65:48 failed faulty running
  |- 32:0:0:6 sds 65:32 failed faulty running
  |- 33:0:0:6 sdr 65:16 failed faulty running
  `- 34:0:0:6 sdu 65:64 failed faulty running

root@rax-rpc-1-compute003:~# fdisk /dev/sdab
fdisk: unable to read /dev/sdab: Invalid argument

description: updated
Revision history for this message
Matthew Heler (rackspace.matt) wrote :

Environment
===========
- OpenStack Release : Liberty
- OS : Ubuntu 14.04 LTS
- Hypervisor : KVM
- Cinder Storage : iSCSI (EMC XtremIO)
- os-brick (0.6.0)

Revision history for this message
Matthew Heler (rackspace.matt) wrote :

The following patch appears to fix the problem.

--- linuxscsi.py.backup 2016-10-26 05:30:25.999795594 +0000
+++ linuxscsi.py 2016-10-26 18:22:14.830975382 +0000
@@ -19,6 +19,8 @@
 import os
 import re

+import time
+
 from oslo_concurrency import processutils as putils
 from oslo_log import log as logging

@@ -121,12 +123,13 @@

         LOG.debug("remove multipath device %s", device)
         mpath_dev = self.find_multipath_device(device)
+ self.flush_multipath_device(mpath_dev['id'])
         if mpath_dev:
             devices = mpath_dev['devices']
             LOG.debug("multipath LUNs to remove %s", devices)
             for device in devices:
+ time.sleep(2)
                 self.remove_scsi_device(device['device'])
- self.flush_multipath_device(mpath_dev['id'])

     def flush_device_io(self, device):
         """This is used to flush any remaining IO in the buffers."""

Before I would get faulty paths during a detachment operation, and now the paths are cleaned correctly.

Revision history for this message
Eric Harney (eharney) wrote :

In bug 1502999, part of the suggestion from comment #3 has been implemented already.

Revision history for this message
Matthew Heler (rackspace.matt) wrote :

I understand, what about the other portion of the patch? - Thanks

Revision history for this message
Chris Breu (chris.breu) wrote :

The moved 'self.flush_multipath_device(mpath_dev['id'])' needs to be placed inside the if statement as flush_multipath_device can return a None.

Revision history for this message
Matthew Heler (rackspace.matt) wrote :

Bug 1437441 still exists for multipath iSCSI devices. The remove_multipath_device function removes the iSCSI device, but doesn't wait to see if the device has actually been deleted on busy systmes. The result is an orphaned iscsi session that never gets logged out.

Revision history for this message
Matthew Heler (rackspace.matt) wrote :

https://review.openstack.org/#/c/409881/

Patch is the same proposed in this review

Revision history for this message
Gorka Eguileor (gorka) wrote :

This has been fixed in latest os-brick version with the major iSCSI refactoring.

Changed in os-brick:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.