os_brick.initiator.linuxfc incorrectly looks for HBA WWPN when searching for SAN host

Bug #1687607 reported by daimon
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Pike
Fix Released
High
Unassigned
Queens
Fix Released
Undecided
Unassigned
Rocky
Fix Released
Undecided
Unassigned
Stein
Fix Released
Undecided
Unassigned
os-brick
Fix Released
Undecided
Unassigned

Bug Description

On a 3par host connected via FC, when trying to live migrate an instance, os_brick causes the following error:

2017-05-02 13:17:09.529 8334 ERROR os_brick.initiator.linuxfc [-] Could not get HBA channel and SCSI target ID, path: /sys/class/fc_transport/target0:, reason: [Errno 2] No such file or directory
Command: grep 50060b0000c29e05 /sys/class/fc_transport/target0:*/node_name
Exit code: -
Stdout: None
Stderr: None

The problem seems to be that os_brick is looking for the wrong WWN. the 50060b0000c29e05 is the one from the host HBA. In the sys file, there is the WWN of the remote 3par host.
# cat /sys/class/fc_transport/target0:*/node_name
0x2ff70002ac015d71
0x2ff70002ac015d71

Multipathing is activated.

Full debug log from nova-compute [the debug output STK HBA INFO was hacked in by me into linuxfc.get_fc_hbas_info()]:
2017-05-02 13:17:09.495 8334 DEBUG os_brick.initiator.linuxfc [req-b8fcdd17-33c1-4a17-8f90-19520ba96e8c abf39a43136b4a03832598dd65cb3677 0629c3ecd6f3468aa47f60331fa383a6 - - -] STK HBA INFO: port_name=50060b0000c29e06, node_name=50060b0000c29e07, host_device=host1, device_path=/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.1/host1/fc_host/host1 get_fc_hbas_info /usr/local/lib/python2.7/dist-packages/os_brick/initiator/linuxfc.py:157
2017-05-02 13:17:09.496 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.0-fc-0x20110002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.496 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.0-fc-0x20120002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.497 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.0-fc-0x21110002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.497 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.0-fc-0x21120002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.498 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.1-fc-0x20110002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.498 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.1-fc-0x20120002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.498 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.1-fc-0x21110002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.499 8334 DEBUG os_brick.initiator.connectors.fibre_channel [-] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:09:00.1-fc-0x21120002ac015d71-lun-1 _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:145
2017-05-02 13:17:09.499 8334 INFO os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not yet found. Will rescan & retry. Try number: 0.
2017-05-02 13:17:09.499 8334 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): grep 50060b0000c29e05 /sys/class/fc_transport/target0:*/node_name execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:355
2017-05-02 13:17:09.528 8334 DEBUG oslo_concurrency.processutils [-] u'grep 50060b0000c29e05 /sys/class/fc_transport/target0:*/node_name' failed. Not Retrying. execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:433
2017-05-02 13:17:09.529 8334 ERROR os_brick.initiator.linuxfc [-] Could not get HBA channel and SCSI target ID, path: /sys/class/fc_transport/target0:, reason: [Errno 2] No such file or directory
Command: grep 50060b0000c29e05 /sys/class/fc_transport/target0:*/node_name
Exit code: -
Stdout: None
Stderr: None
2017-05-02 13:17:09.529 8334 DEBUG os_brick.initiator.linuxfc [-] Scanning host host0 (wwnn: 50060b0000c29e05, c: -, t: -, l: 1) rescan_hosts /usr/local/lib/python2.7/dist-packages/os_brick/initiator/linuxfc.py:77
2017-05-02 13:17:09.531 8400 DEBUG oslo.privsep.daemon [-] privsep: request[140715427872432]: (3, 'os_brick.privileged.rootwrap.execute_root', ('tee', '-a', u'/sys/class/scsi_host/host0/scan'), {'process_input': '- - 1'}) loop /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:443
2017-05-02 13:17:09.531 8400 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): tee -a /sys/class/scsi_host/host0/scan execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:355
2017-05-02 13:17:09.550 8400 DEBUG oslo_concurrency.processutils [-] CMD "tee -a /sys/class/scsi_host/host0/scan" returned: 0 in 0.019s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:385
2017-05-02 13:17:09.551 8400 DEBUG oslo.privsep.daemon [-] privsep: reply[140715427872432]: (4, ('- - 1', '')) loop /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:456
2017-05-02 13:17:09.552 8334 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): grep 50060b0000c29e07 /sys/class/fc_transport/target1:*/node_name execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:355
2017-05-02 13:17:09.581 8334 DEBUG oslo_concurrency.processutils [-] u'grep 50060b0000c29e07 /sys/class/fc_transport/target1:*/node_name' failed. Not Retrying. execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:433
2017-05-02 13:17:09.582 8334 ERROR os_brick.initiator.linuxfc [-] Could not get HBA channel and SCSI target ID, path: /sys/class/fc_transport/target1:, reason: [Errno 2] No such file or directory
Command: grep 50060b0000c29e07 /sys/class/fc_transport/target1:*/node_name
Exit code: -
Stdout: None
Stderr: None

# uname -a
Linux os-compute-01 4.4.0-77-generic #98-Ubuntu SMP Wed Apr 26 08:34:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# apt show nova-compute 2>/dev/null | grep Version
Version: 2:15.0.2-0ubuntu1~cloud0
# apt show cinder-volume 2>/dev/null | grep Version
Version: 2:10.0.0-0ubuntu2~cloud0
# pip show os_brick | grep Version
Version: 1.12.0

Revision history for this message
Arnon Yaari (arnony) wrote :
Revision history for this message
Gorka Eguileor (gorka) wrote :

Patch submitted to fix this: https://review.openstack.org/520052

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.openstack.org/520052
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=4ee404466d3f2ced8bfdfb18927ef19a27967952
Submitter: Zuul
Branch: master

commit 4ee404466d3f2ced8bfdfb18927ef19a27967952
Author: Gorka Eguileor <email address hidden>
Date: Wed Nov 8 21:03:08 2017 +0100

    Fixing FC scanning

    Current FC tries to limit the scanning range by detecting the target and
    channel, unfortunately this code has a good number of implementation
    issues:

    - Matching uses local WWNN instead of target's WWPN.
    - Not using a shell to run the command, so the * glob won't expand.
    - Not using -l on grep command to list file names instead of contents.
    - Not making the search case insensitive.

    This patch fixes all these issues by using the target's WWPNs instead
    -taking into account FC Zone/Access control information if present- and
    supporting both possible connection information formats for the WWPNs
    (single value or list of values).

    Rescan tests have been modified to adhere to unit tests best practices,
    where each test case only tests the specific code in the method under
    test and mocks everything else.

    Closes-Bug: #1664653
    Closes-Bug: #1684996
    Closes-Bug: #1687607
    Change-Id: Ib539f6a3652bab4399c30cd90f326829e839ec02

Changed in os-brick:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 2.3.0

This issue was fixed in the openstack/os-brick 2.3.0 release.

Changed in cloud-archive:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/622348

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (stable/pike)

Reviewed: https://review.openstack.org/622348
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=f821a87ef0fd59cb1b48a85eed16b1f3f16e7467
Submitter: Zuul
Branch: stable/pike

commit f821a87ef0fd59cb1b48a85eed16b1f3f16e7467
Author: Gorka Eguileor <email address hidden>
Date: Wed Nov 8 21:03:08 2017 +0100

    Fixing FC scanning

    Current FC tries to limit the scanning range by detecting the target and
    channel, unfortunately this code has a good number of implementation
    issues:

    - Matching uses local WWNN instead of target's WWPN.
    - Not using a shell to run the command, so the * glob won't expand.
    - Not using -l on grep command to list file names instead of contents.
    - Not making the search case insensitive.

    This patch fixes all these issues by using the target's WWPNs instead
    -taking into account FC Zone/Access control information if present- and
    supporting both possible connection information formats for the WWPNs
    (single value or list of values).

    Rescan tests have been modified to adhere to unit tests best practices,
    where each test case only tests the specific code in the method under
    test and mocks everything else.

    Closes-Bug: #1664653
    Closes-Bug: #1684996
    Closes-Bug: #1687607
    Change-Id: Ib539f6a3652bab4399c30cd90f326829e839ec02
    (cherry picked from commit 4ee404466d3f2ced8bfdfb18927ef19a27967952)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 1.15.7

This issue was fixed in the openstack/os-brick 1.15.7 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.