FC Multipath may leave residual paths due to race condition

Bug #1608614 reported by Gorka Eguileor on 2016-08-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Undecided
Gorka Eguileor

Bug Description

When using FC multipath we may end up with residual paths that should have been removed but due to a race condition they are recreated and left there.

This can create problems if the storage controller reuses the same WWID.

The race condition occurs between the removal of a SCSI device and the connection of a volume and is due to our scanning parameters being too broad (we use - - -, which means any HBA channel, any SCSI target, and any LUN).

Example of the race between 2 VMs hosted in the same compute node:
        VM1                                       VM2
 1. A1(Create LUN):WWID X assigned
 2. A2(Scan LUN):Detected sda-WWID X
 3. ...
 4. R1(Delete path):sda removed
 5.                                       A1(Create LUN):WWID Y assigned
 6.                                       A2(Scan LUN):Detected sda-WWID X
                                                       and sdb-WWID Y
 7. R2(Remove LUN):Remove WWID X

So we'll end up having /dev/sda when it shouldn't.

Gorka Eguileor (gorka) on 2016-08-01
Changed in os-brick:
assignee: nobody → Gorka Eguileor (gorka)

Fix proposed to branch: master
Review: https://review.openstack.org/349598

Changed in os-brick:
status: New → In Progress
Gorka Eguileor (gorka) on 2016-08-02
description: updated
description: updated
description: updated
description: updated
Yafei Yu (yu-yafei) wrote :

ISCSI Multipath may leave residual paths too.

Reviewed: https://review.openstack.org/349598
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=28a4d55a0a465ac36ed012a2d634cb64e8f5d599
Submitter: Jenkins
Branch: master

commit 28a4d55a0a465ac36ed012a2d634cb64e8f5d599
Author: Gorka Eguileor <email address hidden>
Date: Wed Jul 27 14:06:06 2016 +0200

    Fix FC multipath rescan

    Fiber Chanel multipath rescan uses wildcards for the host rescan, which
    can end up recreating devices that had just been removed if there's a
    race condition between the removal of a SCSI device and the connection
    of a volume.

    The race condition happens if a rescan done when attaching happens right
    between us removing the path and removing the lun, because the rescan
    will add not only the new path we are attaching, but the old path we are
    removing, since the lun still hasn't been removed.

    This would leave orphaned devices that pollute our environment and will
    be recognized as down paths when the storage controller reuses the same
    WWID.

    This patch narrows the rescan to only rescan for the specific lun
    number, and if possible it also filters the rescan by HBA channel and
    SCSI target ID.

    We only filter by HBA channel and SCSI target ID when we can find this
    information, and that is when the FC storage servers implement a single
    WWNN for all ports.

    Change-Id: Id6ed98d3fb8b4b980de86256dec8eeda84562c98
    Closes-Bug: #1608614

Changed in os-brick:
status: In Progress → Fix Released

Change abandoned by Gorka Eguileor (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/354577
Reason: No backports on the os-brick library

This issue was fixed in the openstack/os-brick 1.6.0 release.

This issue was fixed in the openstack/os-brick 1.6.0 release.

Change abandoned by Gorka Eguileor (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/354577
Reason: No love from community :'-(

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers