2016-10-25 22:13:23 |
Matthew Heler |
description |
During a volume detachment, os_brick will perform an IO buffer flush on a SCSI device prior to removing it. The device may not have finished flushing it's buffer before os_brick tries to remove the device. If this happens, the device isn't removed from the OS. Unfortunately os_brick will continue on, and assume the device was removed from the OS when in fact it was not.
In Kilo, there existed a looping call to provide a basic sanity check on this specific condition. A similiar check exists in os_brick under the function wait_for_volume_removal. It doesn't appear to be used, this patch enables this function to be used for SCSi device removals.
Without this patch, multipath will report faulty paths after a cinder volume detachment. This is due to the devices still existing on the OS, and not being properly cleaned up by os_brick.
root@rax-rpc-1-compute003:~# multipath -ll
3514f0c5dbd8000d8 dm-9 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
|- 31:0:0:8 sdab 65:176 failed faulty running
|- 32:0:0:8 sdac 65:192 failed faulty running
|- 33:0:0:8 sdaa 65:160 failed faulty running
`- 34:0:0:8 sdz 65:144 failed faulty running
3514f0c5dbd8000ba dm-10 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
|- 31:0:0:9 sdaf 65:240 failed faulty running
|- 34:0:0:9 sdad 65:208 failed faulty running
|- 33:0:0:9 sdae 65:224 failed faulty running
`- 32:0:0:9 sdag 66:0 failed faulty running
3514f0c5dbd8000d7 dm-6 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
|- 31:0:0:6 sdt 65:48 failed faulty running
|- 32:0:0:6 sds 65:32 failed faulty running
|- 33:0:0:6 sdr 65:16 failed faulty running
`- 34:0:0:6 sdu 65:64 failed faulty running
root@rax-rpc-1-compute003:~# fdisk /dev/sdab
fdisk: unable to read /dev/sdab: Invalid argument |
During a volume detachment, a race condition exists where os-brick will not delete the scsi devices correctly. The scsi devices remain logged in, and show up as faulty devices.
This is on Liberty, with an XtremIo backend.
root@rax-rpc-1-compute003:~# multipath -ll
3514f0c5dbd8000d8 dm-9 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
|- 31:0:0:8 sdab 65:176 failed faulty running
|- 32:0:0:8 sdac 65:192 failed faulty running
|- 33:0:0:8 sdaa 65:160 failed faulty running
`- 34:0:0:8 sdz 65:144 failed faulty running
3514f0c5dbd8000ba dm-10 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
|- 31:0:0:9 sdaf 65:240 failed faulty running
|- 34:0:0:9 sdad 65:208 failed faulty running
|- 33:0:0:9 sdae 65:224 failed faulty running
`- 32:0:0:9 sdag 66:0 failed faulty running
3514f0c5dbd8000d7 dm-6 XtremIO,XtremApp
size=5.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=enabled
|- 31:0:0:6 sdt 65:48 failed faulty running
|- 32:0:0:6 sds 65:32 failed faulty running
|- 33:0:0:6 sdr 65:16 failed faulty running
`- 34:0:0:6 sdu 65:64 failed faulty running
root@rax-rpc-1-compute003:~# fdisk /dev/sdab
fdisk: unable to read /dev/sdab: Invalid argument |
|