Failing to creating a vol from an image 3par FC driver

Bug #1812665 reported by Tzach Shefi
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
os-brick
In Progress
Undecided
Unassigned

Bug Description

Created attachment 1522157 [details]
Cinder.conf plus cinder logs

Description of problem: A simple deployment 1 controller + 2 computes, creating an empty Cinder volume works. However creating a volume from an image fails with error described below.
Unsure if config issue or a possible driver bug?

Version-Release number of selected component (if applicable):
RHEL 7.6

puppet-cinder-13.3.1-0.20181013114721.25b1ba3.el7ost.noarch
openstack-cinder-13.0.1-0.20181013185427.31ff628.el7ost.noarch
python2-cinderclient-4.0.1-0.20180809133302.460229c.el7ost.noarch
python-cinder-13.0.1-0.20181013185427.31ff628.el7ost.noarch
python2-os-brick-2.5.3-0.20180816081254.641337b.el7ost.noarch

python-nova-18.0.3-0.20181011032838.d1243fe.el7ost.noarch
openstack-nova-api-18.0.3-0.20181011032838.d1243fe.el7ost.noarch
puppet-nova-13.3.1-0.20181013120143.8ab435c.el7ost.noarch
python2-novaclient-11.0.0-0.20180809174649.f1005ce.el7ost.noarch
openstack-nova-common-18.0.3-0.20181011032838.d1243fe.el7ost.noarch
python-novajoin-1.0.21-1.el7ost.noarch

3par - HPE_3PAR 8200
HPE 3PAR OS version - 3.3.1.410 (MU2)+P32,P34,P37,P40,P41,P45

Cisco FC MDS switch 9148 - version 5.0(1a)

How reproducible:
Hit same issue on two deployments (reused same HW).
Then again it might be my cloned config issue.

Steps to Reproduce:
1. Configure Openstack 14 with 3par FC storage and Cinder back end.

2. Creating an empty volumes works fine
#cinder create 1 --volume-type 3parfc --name 3parEmptyVol7
Volume is created/avaliable, cinder list ->
| 569d57ae-4a10-4fb6-9a9e-85f722ea9caf | available | 3parEmptyVol7 | 1 | 3parfc

Basic Cinder/3par access works fine

3. Creating a volume from an image (cirros) fails

#cinder create 1 --volume-type 3parfc --name 3parVolFromImage1 --image cirros
+--------------------------------+--------------------------------------+
| Property | Value |
+--------------------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2019-01-21T12:31:48.000000 |
| description | None |
| encrypted | False |
| id | 0fafa271-9b7b-4dcd-a98c-9143ef916afe |
..
| status | creating

But after a while we see it failed to create,
#cinder list return ->
| 0fafa271-9b7b-4dcd-a98c-9143ef916afe | error | 3parVolFromImage1 | 1 | 3parfc | false |

On c-vol log I noticed and os-brick error ->
2019-01-21 12:32:13.400 70 ERROR os_brick.initiator.connectors.fibre_channel [-] Fibre Channel volume device not found.
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'os_brick.initiator.connectors.fibre_channel._wait_for_device_discovery' failed: NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall Traceback (most recent call last):
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 171, in _run_loop
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/os_brick/initiator/connectors/fibre_channel.py", line 219, in _wait_for_device_discovery
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall raise exception.NoFibreChannelVolumeDeviceFound()
2019-01-21 12:32:13.401 70 ERROR oslo.service.loopingcall NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.

on contoller's BM host, installed
#yum install sysfsutils
#systool -c fc_host -v Same output below when I run systool inside c-vol docker.

[root@controller-0 cinder]# systool -c fc_host -v
Class = "fc_host"

  Class Device = "host6"
  Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host6/fc_host/host6"
    dev_loss_tmo = "16"
    fabric_name = "0x2002000573a558d1"
    issue_lip = <store method only>
    max_npiv_vports = "254"
    node_name = "0x50014380186af83d"
    npiv_vports_inuse = "0"
    port_id = "0x6b1000"
    port_name = "0x50014380186af83c"
    port_state = "Online"
    port_type = "NPort (fabric via point-to-point)"
    speed = "8 Gbit"
    supported_classes = "Class 3"
    supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"
    symbolic_name = "HPAJ764A FW:v8.07.00 DVR:v10.00.00.06.07.6-k"
    system_hostname = ""
    tgtid_bind_type = "wwpn (World Wide Port Name)"
    uevent =
    vport_create = <store method only>
    vport_delete = <store method only>

    Device = "host6"
    Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.0/host6"
      fw_dump =
      issue_logo = <store method only>
      nvram = "ISP "
      optrom_ctl = <store method only>
      optrom =
      reset = <store method only>
      sfp = ""
      uevent = "DEVTYPE=scsi_host"
      vpd = "�$"

  Class Device = "host7"
  Class Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/host7/fc_host/host7"
    dev_loss_tmo = "16"
    fabric_name = "0x2002000573a558d1"
    issue_lip = <store method only>
    max_npiv_vports = "254"
    node_name = "0x50014380186af83f"
    npiv_vports_inuse = "0"
    port_id = "0x6b0a00"
    port_name = "0x50014380186af83e"
    port_state = "Online"
    port_type = "NPort (fabric via point-to-point)"
    speed = "8 Gbit"
    supported_classes = "Class 3"
    supported_speeds = "1 Gbit, 2 Gbit, 4 Gbit, 8 Gbit"
    symbolic_name = "HPAJ764A FW:v8.07.00 DVR:v10.00.00.06.07.6-k"
    system_hostname = ""
    tgtid_bind_type = "wwpn (World Wide Port Name)"
    uevent =
    vport_create = <store method only>
    vport_delete = <store method only>

    Device = "host7"
    Device path = "/sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/host7"
      fw_dump =
      issue_logo = <store method only>
      nvram = "ISP "
      optrom_ctl = <store method only>
      optrom =
      reset = <store method only>
      sfp = ""
      uevent = "DEVTYPE=scsi_host"
      vpd = "�$"

4. Attaching an empty volume to an instance works.
Attaching volume failed on my previous system unsure why.
But it's working now so a good sign/progress.

Nova instance booted/running ->
| d38e10e4-a937-4c9d-bbac-8bb708f6ac96 | inst1 | ACTIVE | - | Running

Attach empty volume created on step .1 to instance:

#nova volume-attach d38e10e4-a937-4c9d-bbac-8bb708f6ac96 569d57ae-4a10-4fb6-9a9e-85f722ea9caf auto
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdb |
| id | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf |
| serverId | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 |
| volumeId | 569d57ae-4a10-4fb6-9a9e-85f722ea9caf |
+----------+--------------------------------------+

Volume is attached, Cinder list ->
569d57ae-4a10-4fb6-9a9e-85f722ea9caf | in-use | 3parEmptyVol7 | 1 | 3parfc | false | d38e10e4-a937-4c9d-bbac-8bb708f6ac96 |

Actual results:
Failing to create a 3par FC vol from image.

Additional info:

All controller/compute nodes as well as 3par's 4 FC links reside in the same FC zone.
Prior to installing Openstack, I'd successfully attached an FC volume to one of the hosts. So I gather FC zoning is fine.
All hosts belong to same rhos-fc host set on 3par.

The FC switch is a Cisco NX-OS MDS version 5.0(1a)
Not sure if while all ports belong to same FC zone do or don't I need to configure Cinder's zone manager?
I noticed this -> Cinder fc zone manger requirement -> Cisco MDS NX-OS Release 6.2(9) or later, later then my current switch version.

Just in case here is the zone info
zone name hp_3par_cougar07_08_09_16 vsan 2
    member pwwn 21:00:00:1b:32:82:22:9e
    member pwwn 21:01:00:1b:32:a2:22:9e
    member pwwn 51:40:2e:c0:01:7c:3a:d8
    member pwwn 51:40:2e:c0:01:7c:38:6c
    member pwwn 21:01:00:e0:8b:a7:fd:10
    member pwwn 50:01:43:80:18:6a:f8:3e
    member pwwn 51:40:2e:c0:01:7c:38:6e
    member pwwn 21:00:00:24:ff:55:c3:c0
    member pwwn 21:00:00:24:ff:55:c3:c4
    member pwwn 21:00:00:24:ff:55:c3:c5
    member pwwn 20:01:00:02:ac:02:1f:6b
    member pwwn 20:02:00:02:ac:02:1f:6b
    member pwwn 21:01:00:02:ac:02:1f:6b
    member pwwn 21:02:00:02:ac:02:1f:6b

The last 4 one *6b are 3Par's 4 FC ports.
All the other wwn are dual port FC HBAs attached to my controllers/computes.

Revision history for this message
Tzach Shefi (tshefi) wrote :
Sneha Rai (sneharai4)
Changed in cinder:
assignee: nobody → Sneha Rai (sneharai4)
assignee: Sneha Rai (sneharai4) → nobody
Prasanna (prablr79)
Changed in cinder:
status: New → In Progress
Changed in cinder:
assignee: nobody → Raghavendra Tilay (raghavendrat)
Revision history for this message
Tzach Shefi (tshefi) wrote :

We found the issue, it turns out OS brick and HP's AJ76A HBA don't play nicely together.

I had this same HP HBA in my controller and one of my compute nodes.
Both of them I had hit the same error
NoFibreChannelVolumeDeviceFound: Unable to find a Fibre Channel volume device.

Create volume from image failed, attach volume failed, back of a volume failed.

However on a second compute attach volume worked, it just so happens that compute used another type of HBA. Working on a hunch I had swapped my controller's HBA to Qlogic, create volume from image works.

The odd thing is the HP AJ764A HBA is based on same Qlogic QLE2526 chip on my "working" HBA.
So same chip set Qlogic branded HBA works while HP clone doesn't.

Changed in cinder:
assignee: Raghavendra Tilay (raghavendrat) → nobody
affects: cinder → os-brick
Revision history for this message
Raghavendra Tilay (raghavendrat) wrote :

Code changes need to be done in os_brick/initiator/linuxfc.py

From function "rescan_hosts", below line needs to be removed:
hbas = [hba for hba in hbas if hba['port_name'] in ports]

Only then, all the hbas would be considered.

Also, between two for loops, value of ctls can be checked as given below:

for hba, ctls in process:
    if ctls is not None:
        for hba_channel, target_id, target_lun in ctls:

Someone from os_brick team can further update.

Revision history for this message
Raghavendra Tilay (raghavendrat) wrote :

Hi Shefi,

You mentioned that creation of bootable volume worked with second compute.
Wish to know ...
(i) did you enable multipath?
OR
(ii) did you use first n:s:p from 3par backend?

Revision history for this message
Raghavendra Tilay (raghavendrat) wrote :

We are currently working on this issue.
It is being tracked via: https://bugs.launchpad.net/cinder/+bug/1809249
Also submitted code changes for community review: https://review.opendev.org/#/c/657585/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.0.0.0rc1

This issue was fixed in the openstack/cinder 15.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 14.0.2

This issue was fixed in the openstack/cinder 14.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 13.0.7

This issue was fixed in the openstack/cinder 13.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.10

This issue was fixed in the openstack/cinder 12.0.10 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.