Fibre channel not scanning all targets

Bug #2051237 reported by Rajat Dhasmana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Released
Undecided
Unassigned

Bug Description

When we try to discover a fibre channel device, we follow the following steps:

1. Find device in path "/dev/disk/by-path/<platform>pci-<pci_num>-fc-<target_wwn>-lun-<lun>"
We require the following info to create the path: platform, pci_num, target_wwn, lun.

2. After the first scan, we either find the device (2.1) or not (2.2)

2.1 If the path exists and we are able to read from the device, it is a valid device for use and we exit the loop.

2.2 If the path doesn't show up, we follow the following steps:

2.2.1 Rescan HBAs

2.2.2 Based on the initiator target map, remove the initiator HBAs that don't have access to target

2.2.3 Find the CTL values by grep'ing into /sys/class/fc_transport path

grep -Gil <Target_WWN> /sys/class/fc_transport/target<HOST>:*/port_name

2.2.4 Perform a scan based on the HCTL values

echo "C T L" > /sys/class/scsi_host/host<H>/scan

2.2.4 Repeat 1. until the the retries are exhausted (Default=3)

The problem with this approach is in the 2.2.3 step where we try to grep the CTLs from /sys/class/fc_transport path. This path only contains the targets that have a LUN connected to the host.

Example: If we have 2 controllers on the backend side with 4 targets each
For the first LUN mapping from controller1, we will do a wildcard scan and find the 4 targets from controller1 which will get populated in the /fc_transport path.
If we try to do a LUN attachment from controller2, we try to find targets in the fc_transport path but it only contains targets from controller1 so we will not be able to discover the LUN.

Doing a manual wildcard scan will discover the targets from controller2.

So the /sys/class/fc_transport is not reliable to discover available targets for the host and we need another mechanism to find targets reachable from host.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/os-brick/+/906743

Changed in os-brick:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/906743
Committed: https://opendev.org/openstack/os-brick/commit/f2154eedf0d04ef960dbf1df7fb87f74f6a35dbf
Submitter: "Zuul (22348)"
Branch: master

commit f2154eedf0d04ef960dbf1df7fb87f74f6a35dbf
Author: Rajat Dhasmana <email address hidden>
Date: Thu Jan 25 20:12:55 2024 +0530

    Fix: FC partial target scan

    When fetching the target value (T in HCTL) for the storage HBAs,
    we use the /sys/class/fc_transport path to find available targets.
    However, this path only contains targets that already have a LUN
    attached from, to the host.

    Scenario:
    If we have 2 controllers on the backend side with 4 target HBAs each (total 8).
    For the first LUN mapping from controller1, we will do a wildcard
    scan and find the 4 targets from controller1 which will get
    populated in the /fc_transport path.
    If we try mapping a LUN from controller2, we try to find targets in the
    fc_transport path but the path only contains targets from controller1 so
    we will not be able to discover the LUN from controller2 and fail with
    NoFibreChannelVolumeDeviceFound exception.

    Solution:
    In each rescan attempt, we will first search for targets in the
    fc_transport path: "/sys/class/fc_transport/target<host>*".
    If the target in not found then we will search in the fc_remote_ports
    path: "/sys/class/fc_remote_ports/rport-<host>*"

    If a [c,t,l] combination is found from either path, we add it to
    the list of ctls we later use it for scanning.

    This way, we don't alter the current "working" mechanism of scanning
    but also add an additional way of discovering targets and improving
    the scan to avoid failure scenarios in each rescan attempt.

    Closes-Bug: #2051237
    Change-Id: Ia74b0fc24e0cf92453e65d15b4a76e565ed04d16

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/os-brick/+/909083

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 6.7.0

This issue was fixed in the openstack/os-brick 6.7.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/909083
Committed: https://opendev.org/openstack/os-brick/commit/985ef3cc45ac105730c940d954a537d12bb4ec89
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 985ef3cc45ac105730c940d954a537d12bb4ec89
Author: Rajat Dhasmana <email address hidden>
Date: Thu Jan 25 20:12:55 2024 +0530

    Fix: FC partial target scan

    When fetching the target value (T in HCTL) for the storage HBAs,
    we use the /sys/class/fc_transport path to find available targets.
    However, this path only contains targets that already have a LUN
    attached from, to the host.

    Scenario:
    If we have 2 controllers on the backend side with 4 target HBAs each (total 8).
    For the first LUN mapping from controller1, we will do a wildcard
    scan and find the 4 targets from controller1 which will get
    populated in the /fc_transport path.
    If we try mapping a LUN from controller2, we try to find targets in the
    fc_transport path but the path only contains targets from controller1 so
    we will not be able to discover the LUN from controller2 and fail with
    NoFibreChannelVolumeDeviceFound exception.

    Solution:
    In each rescan attempt, we will first search for targets in the
    fc_transport path: "/sys/class/fc_transport/target<host>*".
    If the target in not found then we will search in the fc_remote_ports
    path: "/sys/class/fc_remote_ports/rport-<host>*"

    If a [c,t,l] combination is found from either path, we add it to
    the list of ctls we later use it for scanning.

    This way, we don't alter the current "working" mechanism of scanning
    but also add an additional way of discovering targets and improving
    the scan to avoid failure scenarios in each rescan attempt.

    Closes-Bug: #2051237
    Change-Id: Ia74b0fc24e0cf92453e65d15b4a76e565ed04d16
    (cherry picked from commit f2154eedf0d04ef960dbf1df7fb87f74f6a35dbf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/os-brick/+/910313

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/910313
Committed: https://opendev.org/openstack/os-brick/commit/c526c3beb1f6aebaa8ec8878a292c9deff582e16
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit c526c3beb1f6aebaa8ec8878a292c9deff582e16
Author: Rajat Dhasmana <email address hidden>
Date: Thu Jan 25 20:12:55 2024 +0530

    Fix: FC partial target scan

    When fetching the target value (T in HCTL) for the storage HBAs,
    we use the /sys/class/fc_transport path to find available targets.
    However, this path only contains targets that already have a LUN
    attached from, to the host.

    Scenario:
    If we have 2 controllers on the backend side with 4 target HBAs each (total 8).
    For the first LUN mapping from controller1, we will do a wildcard
    scan and find the 4 targets from controller1 which will get
    populated in the /fc_transport path.
    If we try mapping a LUN from controller2, we try to find targets in the
    fc_transport path but the path only contains targets from controller1 so
    we will not be able to discover the LUN from controller2 and fail with
    NoFibreChannelVolumeDeviceFound exception.

    Solution:
    In each rescan attempt, we will first search for targets in the
    fc_transport path: "/sys/class/fc_transport/target<host>*".
    If the target in not found then we will search in the fc_remote_ports
    path: "/sys/class/fc_remote_ports/rport-<host>*"

    If a [c,t,l] combination is found from either path, we add it to
    the list of ctls we later use it for scanning.

    This way, we don't alter the current "working" mechanism of scanning
    but also add an additional way of discovering targets and improving
    the scan to avoid failure scenarios in each rescan attempt.

    Closes-Bug: #2051237
    Change-Id: Ia74b0fc24e0cf92453e65d15b4a76e565ed04d16
    (cherry picked from commit f2154eedf0d04ef960dbf1df7fb87f74f6a35dbf)
    (cherry picked from commit 985ef3cc45ac105730c940d954a537d12bb4ec89)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.