fc rescan issue when looking at specific hbas

Bug #1774283 reported by Patrick East
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Released
Undecided
Patrick East

Bug Description

Looks like maybe an issue introduced with https://review.openstack.org/#/c/562219/

Volumes fail to attach with an error in n-cpu like:

May 30 22:13:03 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] ==> connect_volume: call u"{'args': (<os_brick.initiator.connectors.fibre_channel.FibreChannelConnector object at 0x7fe443fcc490>, {u'initiator_target_map': {u'21000024ff50add1': [u'524A937377E70300', u'524A937377E70301', u'524A937377E70310', u'524A937377E70311'], u'21000024ff50add0': [u'524A937377E70300', u'524A937377E70301', u'524A937377E70310', u'524A937377E70311']}, u'target_discovered': True, u'encrypted': False, u'qos_specs': None, u'discard': True, u'target_lun': 2, u'access_mode': u'rw', u'target_wwn': [u'524A937377E70300', u'524A937377E70301', u'524A937377E70310', u'524A937377E70311']}), 'kwargs': {}}" {{(pid=21757) trace_logging_wrapper /usr/local/lib/python2.7/dist-packages/os_brick/utils.py:146}}
May 30 22:13:03 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.lockutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Lock "connect_volume" acquired by "os_brick.initiator.connectors.fibre_channel.connect_volume" :: waited 0.000s {{(pid=21757) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:273}}
May 30 22:13:03 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] execute = <bound method FibreChannelConnector._execute of <os_brick.initiator.connectors.fibre_channel.FibreChannelConnector object at 0x7fe443fcc490>> {{(pid=21757) connect_volume /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:126}}
May 30 22:13:03 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo.privsep.daemon [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] privsep: request[140618369673008]: (3, 'os_brick.privileged.rootwrap.execute_root', ('systool', '-c', 'fc_host', '-v'), {}) {{(pid=3600) loop /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:443}}
May 30 22:13:03 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Running cmd (subprocess): systool -c fc_host -v {{(pid=3600) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:372}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] CMD "systool -c fc_host -v" returned: 0 in 1.295s {{(pid=3600) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo.privsep.daemon [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] privsep: reply[140618369673008]: (4, ('Class = "fc_host"\n\n Class Device = "host2"\n Class Device path = "/sys/devices/pci0000:00/0000:00:06.0/host2/fc_host/host2"\n dev_loss_tmo = "30"\n fabric_name = "0x2fa28c604f722601"\n issue_lip = <store method only>\n max_npiv_vports = "0"\n node_name = "0x20000024ff50add1"\n npiv_vports_inuse = "0"\n port_id = "0x172000"\n port_name = "0x21000024ff50add1"\n port_state = "Online"\n port_type = "NPort (fabric via point-to-point)"\n speed = "8 Gbit"\n supported_classes = "Class 3"\n supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"\n symbolic_name = "QLE2562 FW:v5.09.00 DVR:v8.07.00.26-k"\n system_hostname = ""\n tgtid_bind_type = "wwpn (World Wide Port Name)"\n uevent = \n vport_create = <store method only>\n vport_delete = <store method only>\n\n Device = "host2"\n Device path = "/sys/devices/pci0000:00/0000:00:06.0/host2"\n fw_dump = \n issue_logo = <store method only>\n nvram = "ISP \x01"\n optrom_ctl = <store method only>\n optrom = \n reset = <store method only>\n sfp = "\x03\x04\x07"\n uevent = "DEVTYPE=scsi_host"\n vpd = "\x82."\n\n\n Class Device = "host3"\n Class Device path = "/sys/devices/pci0000:00/0000:00:07.0/host3/fc_host/host3"\n dev_loss_tmo = "30"\n fabric_name = "0x20018c604f9334f1"\n issue_lip = <store method only>\n max_npiv_vports = "0"\n node_name = "0x20000024ff50add0"\n npiv_vports_inuse = "0"\n port_id = "0xaa0d00"\n port_name = "0x21000024ff50add0"\n port_state
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: = "Online"\n port_type = "NPort (fabric via point-to-point)"\n speed = "8 Gbit"\n supported_classes = "Class 3"\n supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"\n symbolic_name = "QLE2562 FW:v5.09.00 DVR:v8.07.00.26-k"\n system_hostname = ""\n tgtid_bind_type = "wwpn (World Wide Port Name)"\n uevent = \n vport_create = <store method only>\n vport_delete = <store method only>\n\n Device = "host3"\n Device path = "/sys/devices/pci0000:00/0000:00:07.0/host3"\n fw_dump = \n issue_logo = <store method only>\n nvram = "ISP \x01"\n optrom_ctl = <store method only>\n optrom = \n reset = <store method only>\n sfp = "\x03\x04\x07"\n uevent = "DEVTYPE=scsi_host"\n vpd = "\x82."\n\n\n', '')) {{(pid=3600) loop /usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py:456}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:06.0-fc-0x524a937377e70300-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:06.0-fc-0x524a937377e70301-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:06.0-fc-0x524a937377e70310-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:06.0-fc-0x524a937377e70311-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:07.0-fc-0x524a937377e70300-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:07.0-fc-0x524a937377e70301-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:07.0-fc-0x524a937377e70310-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Looking for Fibre Channel dev /dev/disk/by-path/pci-0000:00:07.0-fc-0x524a937377e70311-lun-2 {{(pid=21757) _wait_for_device_discovery /usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py:144}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: INFO os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Fibre Channel volume device not yet found. Will rescan & retry. Try number: 0.
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.linuxfc [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Rescaning HBAs [{'host_device': u'host2', 'device_path': u'/sys/devices/pci0000:00/0000:00:06.0/host2/fc_host/host2', 'port_name': u'21000024ff50add1', 'node_name': u'20000024ff50add1'}, {'host_device': u'host3', 'device_path': u'/sys/devices/pci0000:00/0000:00:07.0/host3/fc_host/host3', 'port_name': u'21000024ff50add0', 'node_name': u'20000024ff50add0'}] with connection properties {u'initiator_target_map': {u'21000024ff50add1': [u'524A937377E70300', u'524A937377E70301', u'524A937377E70310', u'524A937377E70311'], u'21000024ff50add0': [u'524A937377E70300', u'524A937377E70301', u'524A937377E70310', u'524A937377E70311']}, u'target_discovered': True, u'encrypted': False, u'qos_specs': None, u'discard': True, u'target_lun': 2, u'access_mode': u'rw', u'target_wwn': [u'524A937377E70300', u'524A937377E70301', u'524A937377E70310', u'524A937377E70311']} {{(pid=21757) rescan_hosts /usr/local/lib/python2.7/dist-packages/os_brick/initiator/linuxfc.py:82}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.linuxfc [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Using initiator target map to exclude HBAs {{(pid=21757) rescan_hosts /usr/local/lib/python2.7/dist-packages/os_brick/initiator/linuxfc.py:90}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Running cmd (subprocess): grep -Gil "524A937377E70300\|524A937377E70301\|524A937377E70310\|524A937377E70311" /sys/class/fc_transport/target2:*/port_name {{(pid=21757) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:372}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] CMD "grep -Gil "524A937377E70300\|524A937377E70301\|524A937377E70310\|524A937377E70311" /sys/class/fc_transport/target2:*/port_name" returned: 2 in 0.043s {{(pid=21757) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] u'grep -Gil "524A937377E70300\\|524A937377E70301\\|524A937377E70310\\|524A937377E70311" /sys/class/fc_transport/target2:*/port_name' failed. Not Retrying. {{(pid=21757) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:457}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.linuxfc [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Could not get HBA channel and SCSI target ID, path: /sys/class/fc_transport/target2:*, reason: Unexpected error while running command.
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: Command: grep -Gil "524A937377E70300\|524A937377E70301\|524A937377E70310\|524A937377E70311" /sys/class/fc_transport/target2:*/port_name
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: Exit code: 2
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: Stdout: u''
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: Stderr: u'grep: /sys/class/fc_transport/target2:*/port_name: No such file or directory\n' {{(pid=21757) _get_hba_channel_scsi_target /usr/local/lib/python2.7/dist-packages/os_brick/initiator/linuxfc.py:76}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Running cmd (subprocess): grep -Gil "524A937377E70300\|524A937377E70301\|524A937377E70310\|524A937377E70311" /sys/class/fc_transport/target3:*/port_name {{(pid=21757) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:372}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.processutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] CMD "grep -Gil "524A937377E70300\|524A937377E70301\|524A937377E70310\|524A937377E70311" /sys/class/fc_transport/target3:*/port_name" returned: 0 in 0.015s {{(pid=21757) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel._wait_for_device_discovery' failed: TypeError: 'NoneType' object is not iterable
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall Traceback (most recent call last):
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo_service/loopingcall.py", line 193, in _run_loop
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/os_brick/initiator/connectors/fibre_channel.py", line 160, in _wait_for_device_discovery
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall self._linuxfc.rescan_hosts(hbas, connection_properties)
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall File "/usr/local/lib/python2.7/dist-packages/os_brick/initiator/linuxfc.py", line 112, in rescan_hosts
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall for hba_channel, target_id in cts:
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall TypeError: 'NoneType' object is not iterable
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: ERROR oslo.service.loopingcall
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG oslo_concurrency.lockutils [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] Lock "connect_volume" released by "os_brick.initiator.connectors.fibre_channel.connect_volume" :: held 1.377s {{(pid=21757) inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:285}}
May 30 22:13:04 fc-cinder-dev-3 nova-compute[21757]: DEBUG os_brick.initiator.connectors.fibre_channel [None req-0e8ec476-e7ce-4235-bdb3-4d7e53dc198b admin admin] <== connect_volume: exception (1378ms) TypeError("'NoneType' object is not iterable",) {{(pid=21757) trace_logging_wrapper /usr/local/lib/python2.7/dist-packages/os_brick/utils.py:156}}

The issue being that in rescan_hosts we build a list of tuples called "process" with something like (hba, cts), where the cts value comes from _get_hba_channel_scsi_target(), which is returning "None" because /sys/class/fc_transport/* paths aren't there yet. This is handled elsewhere by doing something like cts = cts or [('-', '-')] for when _get_hba_channel_scsi_target() returns None.

Changed in os-brick:
assignee: nobody → Patrick East (patrick-east)
Changed in os-brick:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-brick (master)

Change abandoned by Patrick East (<email address hidden>) on branch: master
Review: https://review.openstack.org/571332

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.openstack.org/571910

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.openstack.org/571910
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=50ef90251c474b69665fd49444db6ae06711f408
Submitter: Zuul
Branch: master

commit 50ef90251c474b69665fd49444db6ae06711f408
Author: Patrick East <email address hidden>
Date: Fri Jun 1 18:23:09 2018 -0700

    FC fix for scanning only connected HBA's

    When provided an initiator target map and attempting a rescan
    in LinuxFibreChannel.rescan_hosts(...) if the system doesn't
    yet have /sys/class/fc_host/fc_transport/** directories then
    we can get into a position where the call to
    LinuxFibreChannel._get_hba_channel_scsi_target will return
    "None". Later on when we iterate through "process" and attempt
    to iterate on "cts" this will raise an exception. We used to
    fall back to a wildcard scan, and still do if no target map
    is provided. This fix will do just that, for cases where we
    don't have enough other information we just use wildcards.

    In testing it seems like this happens on first-time connections
    to a target after the ACLs have been setup on the arrays for
    the initiator. After that we can get the HBA channel and SCSI
    target with the grep call in _get_hba_channel_scsi_target

    Change-Id: Ifd79b055882bb513fccf21d584baaeb1d60e67f2
    Closes-Bug: #1774283

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 2.5.1

This issue was fixed in the openstack/os-brick 2.5.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/589539

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (stable/queens)

Reviewed: https://review.openstack.org/589539
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=e211056d650b9be9f7501b1497416c8fc14312d5
Submitter: Zuul
Branch: stable/queens

commit e211056d650b9be9f7501b1497416c8fc14312d5
Author: Patrick East <email address hidden>
Date: Fri Jun 1 18:23:09 2018 -0700

    FC fix for scanning only connected HBA's

    When provided an initiator target map and attempting a rescan
    in LinuxFibreChannel.rescan_hosts(...) if the system doesn't
    yet have /sys/class/fc_host/fc_transport/** directories then
    we can get into a position where the call to
    LinuxFibreChannel._get_hba_channel_scsi_target will return
    "None". Later on when we iterate through "process" and attempt
    to iterate on "cts" this will raise an exception. We used to
    fall back to a wildcard scan, and still do if no target map
    is provided. This fix will do just that, for cases where we
    don't have enough other information we just use wildcards.

    In testing it seems like this happens on first-time connections
    to a target after the ACLs have been setup on the arrays for
    the initiator. After that we can get the HBA channel and SCSI
    target with the grep call in _get_hba_channel_scsi_target

    Change-Id: Ifd79b055882bb513fccf21d584baaeb1d60e67f2
    Closes-Bug: #1774283
    (cherry picked from commit 50ef90251c474b69665fd49444db6ae06711f408)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 2.3.3

This issue was fixed in the openstack/os-brick 2.3.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.