Bug #1917750 “Running parallel iSCSI/LVM c-vol backends is causi...” : Bugs : Cinder

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-03-09:

#1

Download full text (9.1 KiB)

Another data point, this time in the nova-live-migration job. Again I see two volumes attached from different backends with the same WWN, this time os-brick returns different sd devices and not the device-mapper links:

https://zuul.opendev.org/t/openstack/build/fb643b53835341ac8589afeadfa7044d/log/compute1/logs/screen-n-cpu.txt

Mar 08 22:29:56.730776 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] Connected to sda using {'target_discovered': False, 'target_portal': '10.208.224.42:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-3d034a25-2301-44f7-b10a-7e0a46402c29', 'target_lun': 1, 'volume_id': '3d034a25-2301-44f7-b10a-7e0a46402c29', 'auth_method': 'CHAP', 'auth_username': 'V2L87oQWgpAL7wPMRFng', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False} {{(pid=53492) _connect_vol /usr/local/lib/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:675}}
Mar 08 22:29:56.732310 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] <== connect_volume: return (1163ms) {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sda'} {{(pid=53492) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:171}}
Mar 08 22:29:56.732775 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG nova.virt.libvirt.volume.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] Attached iSCSI volume {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sda'} {{(pid=53492) connect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:65}}

Mar 08 22:30:02.484937 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] Connected to sdb using {'target_discovered': False, 'target_portal': '10.208.224.30:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-2384d911-102d-4b91-bd63-4c73baabd6b6', 'target_lun': 1, 'volume_id': '2384d911-102d-4b91-bd63-4c73baabd6b6', 'auth_method': 'CHAP', 'auth_username': 'rpEQU9A9uBRqDkLWVwkL', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False} {{(pid=53492) _connect_vol /usr/local/lib/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:675}}
Mar 08 22:30:02.487079 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] <== connect_volume: return (1164ms) {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path...

Another data point, this time in the nova-live-migration job. Again I see two volumes attached from different backends with the same WWN, this time os-brick returns different sd devices and not the device-mapper links:

https://zuul.opendev.org/t/openstack/build/fb643b53835341ac8589afeadfa7044d/log/compute1/logs/screen-n-cpu.txt

Mar 08 22:29:56.730776 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] Connected to sda using {'target_discovered': False, 'target_portal': '10.208.224.42:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-3d034a25-2301-44f7-b10a-7e0a46402c29', 'target_lun': 1, 'volume_id': '3d034a25-2301-44f7-b10a-7e0a46402c29', 'auth_method': 'CHAP', 'auth_username': 'V2L87oQWgpAL7wPMRFng', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False} {{(pid=53492) _connect_vol /usr/local/lib/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:675}}
Mar 08 22:29:56.732310 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] <== connect_volume: return (1163ms) {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sda'} {{(pid=53492) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:171}}
Mar 08 22:29:56.732775 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG nova.virt.libvirt.volume.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] Attached iSCSI volume {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sda'} {{(pid=53492) connect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:65}}

Mar 08 22:30:02.484937 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] Connected to sdb using {'target_discovered': False, 'target_portal': '10.208.224.30:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-2384d911-102d-4b91-bd63-4c73baabd6b6', 'target_lun': 1, 'volume_id': '2384d911-102d-4b91-bd63-4c73baabd6b6', 'auth_method': 'CHAP', 'auth_username': 'rpEQU9A9uBRqDkLWVwkL', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False} {{(pid=53492) _connect_vol /usr/local/lib/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:675}}
Mar 08 22:30:02.487079 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] <== connect_volume: return (1164ms) {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sdb'} {{(pid=53492) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:171}}
Mar 08 22:30:02.487948 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG nova.virt.libvirt.volume.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] Attached iSCSI volume {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sdb'} {{(pid=53492) connect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:65}}

Later the first volume is detached with the end of that test:

Mar 08 22:30:10.883325 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: INFO nova.virt.libvirt.driver [-] [instance: 56f74c85-dfa1-4e86-8efb-5c37b3e6fcdd] Migration operation has completed
Mar 08 22:30:10.883325 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: INFO nova.compute.manager [-] [instance: 56f74c85-dfa1-4e86-8efb-5c37b3e6fcdd] _post_live_migration() is started..
Mar 08 22:30:10.890231 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG nova.virt.libvirt.volume.iscsi [-] [instance: 56f74c85-dfa1-4e86-8efb-5c37b3e6fcdd] calling os-brick to detach iSCSI Volume {{(pid=53492) disconnect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:72}}
Mar 08 22:30:10.890231 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [-] ==> disconnect_volume: call "{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7f1fa07513d0>, {'target_discovered': False, 'target_portal': '10.208.224.42:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-3d034a25-2301-44f7-b10a-7e0a46402c29', 'target_lun': 1 , 'volume_id': '3d034a25-2301-44f7-b10a-7e0a46402c29', 'auth_method': 'CHAP', 'auth_username': 'V2L87oQWgpAL7wPMRFng', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'c      acheable': False, 'device_path': '/dev/sda'}, None), 'kwargs': {}}" {{(pid=53492) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:144}}
[..]
Mar 08 22:30:10.920184 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [-] Resulting device map defaultdict(<function ISCSIConnector._get_connection_devices.<locals>.<lambda> at 0x7f1fa829e1f0>, {('10.208.224.42:3260', 'iqn.2010-10.org.openstack:volume-3d034a25-2301-44f7-b10a-7e0a46402c29'): ({'sda'}, set())}) {{(pid=53492) _get_connection_devices /usr/local/li      b/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:856}}
Mar 08 22:30:10.920184 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.linuxscsi [-] Removing single pathed devices sda {{(pid=53492) remove_connection /usr/local/lib/python3.8/dist-packages/os_brick/initiator/linuxscsi.py:306}}
Mar 08 22:30:10.920184 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.linuxscsi [-] Flushing IO for device /dev/sda {{(pid=53492) flush_device_io /usr/local/lib/python3.8/dist-packages/os_brick/initiator/linuxscsi.py:357}}
Mar 08 22:30:10.927806 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.linuxscsi [-] Remove SCSI device /dev/sda with /sys/block/sda/device/delete {{(pid=53492) remove_scsi_device /usr/local/lib/python3.8/dist-packages/os_brick/initiator/linuxscsi.py:74}}
Mar 08 22:30:10.990661 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.linuxscsi [-] Checking to see if SCSI volumes sda have been removed. {{(pid=53492) wait_for_volumes_removal /usr/local/lib/python3.8/dist-packages/os_brick/initiator/linuxscsi.py:82}}
Mar 08 22:30:10.991004 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.linuxscsi [-] SCSI volumes sda have been removed. {{(pid=53492) wait_for_volumes_removal /usr/local/lib/python3.8/dist-packages/os_brick/initiator/linuxscsi.py:92}}

After this libvirt is then unable to detach the volume from the instance in the second test:

Mar 08 22:30:18.778400 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG oslo.service.loopingcall [None req-754a1962-4d51-4f79-9b9e-6b8f72730c03 tempest-LiveAutoBlockMigrationV225Test-1318556871 t      empest-LiveAutoBlockMigrationV225Test-1318556871-project] Waiting for function nova.virt.libvirt.guest.Guest.detach_device_with_retry.<locals>._do_wait_and_retry_detach to return. {{(pid=53492) func /usr/l      ocal/lib/python3.8/dist-packages/oslo_service/loopingcall.py:435}}
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG nova.virt.libvirt.guest [-] detach device xml: <disk type="block" device="disk">
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]:   <driver name="qemu" type="raw" cache="none" io="native"/>
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]:   <source dev="/dev/sdb"/>
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]:   <target dev="vdb" bus="virtio"/>
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]:   <serial>2384d911-102d-4b91-bd63-4c73baabd6b6</serial>
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]:   <address type="pci" domain="0x0000" bus="0x00" slot="0x07" function="0x0"/>
Mar 08 22:30:18.780322 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: </disk>
[..]
Mar 08 22:35:28.891370 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: WARNING nova.virt.block_device [None req-754a1962-4d51-4f79-9b9e-6b8f72730c03 tempest-LiveAutoBlockMigrationV225Test-1318556871 tempest-LiveAutoBlockMigrationV225Test-1318556871-project] [instance: 0aad2869-ecae-4361-9a2a-3201191da38b] Guest refused to detach volume 2384d911-102d-4b91-bd63-4c73baabd6b6: nova.exception.DeviceDetachFailed: Device detach failed for vdb: Unable to detach the device from the live config.

I've got no evidence to show that libvirt is unable to detach *because* of the earlier volume attachment using the same WWN, it's just a hunch at the moment.

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-03-09:

#2

Fun so this appears to be the result of the tgtd backend still being used by Ubuntu based CI hosts. With a Fedora 32 based multinode env I've just deployed using lio-adm I see different WWNs for each volume from a different LVM/iSCSI c-vol backend:

$ ll /dev/disk/by-id/wwn-0x6001405*
lrwxrwxrwx. 1 root root 9 Mar 9 13:31 /dev/disk/by-id/wwn-0x600140525061514ba074fd880aac7d00 -> ../../sdb
lrwxrwxrwx. 1 root root 9 Mar 9 13:32 /dev/disk/by-id/wwn-0x60014056e6d98ce391f402ea4e6f4e29 -> ../../sdc

I'll rebuild with Focal now and confirm the behaviour with tgtd.

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-03-09:

#3

I've reproduced the single WWN part on a Focal based multinode env, I've not yet reproduced the odd detach behaviour seen in the nova-live-migration job.

$ openstack server create --flavor 1 --image cirros-0.5.1-x86_64-disk --network private test

Two volumes are created in the two different backends:

$ openstack volume create --size 1 test-1
$ openstack volume create --size 1 test-2

stack@devstack-focal-ctrl:~/devstack$ sudo lvs | grep volume-
volume-12e58e2e-41cb-46dd-b294-4feb836f9434 stack-volumes-lvmdriver-1 Vwi-aotz-- 1.00g stack-volumes-lvmdriver-1-pool 0.00

stack@devstack-focal-cpu:~/devstack$ sudo lvs | grep volume-
volume-3e72097c-6281-4cbb-b676-337eb7e267d1 stack-volumes-lvmdriver-1 Vwi-aotz-- 1.00g stack-volumes-lvmdriver-1-pool 0.00

Attached to a test instance:

$ openstack server add volume test test-1
$ openstack server add volume test test-2

os-brick has told n-cpu to attach /dev/sd{b,c}:

$ sudo virsh domblklist 5f28ff81-93d6-48e9-bdca-7b67d3eea835
Target Source
------------------------------------------------------------------------------------
vda /opt/stack/data/nova/instances/5f28ff81-93d6-48e9-bdca-7b67d3eea835/disk
vdb /dev/sdb
vdc /dev/sdc

However we only see a single WWN under /dev/disk/by-id/wwn*:

$ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 10 Mar 9 21:08 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../dm-5

Oddly I also see the devices being picked up as path devices because of the duplicate WWN?!

$ lsblk
[..]
sdb 8:16 0 1G 0 disk
└─mpatha 253:5 0 1G 0 mpath
sdc 8:32 0 1G 0 disk
└─mpatha 253:5 0 1G 0 mpath

I didn't configure multipathd on this deployment but here it is configured and providing a mpath device:

$ sudo multipath -ll
mpatha (360000000000000000e00000000010001) dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 7:0:0:1 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
`- 8:0:0:1 sdc 8:32 active ready running

I've reproduced the single WWN part on a Focal based multinode env, I've not yet reproduced the odd detach behaviour seen in the nova-live-migration job.

$ openstack server create --flavor 1 --image cirros-0.5.1-x86_64-disk --network private test

Two volumes are created in the two different backends:

$ openstack volume create --size 1 test-1
$ openstack volume create --size 1 test-2

stack@devstack-focal-ctrl:~/devstack$ sudo lvs | grep volume-
  volume-12e58e2e-41cb-46dd-b294-4feb836f9434 stack-volumes-lvmdriver-1 Vwi-aotz--  1.00g stack-volumes-lvmdriver-1-pool        0.00

stack@devstack-focal-cpu:~/devstack$ sudo lvs | grep volume-
  volume-3e72097c-6281-4cbb-b676-337eb7e267d1 stack-volumes-lvmdriver-1 Vwi-aotz--  1.00g stack-volumes-lvmdriver-1-pool        0.00

Attached to a test instance:

$ openstack server add volume test test-1
$ openstack server add volume test test-2

os-brick has told n-cpu to attach /dev/sd{b,c}:

$ sudo virsh domblklist 5f28ff81-93d6-48e9-bdca-7b67d3eea835
 Target   Source
------------------------------------------------------------------------------------
 vda      /opt/stack/data/nova/instances/5f28ff81-93d6-48e9-bdca-7b67d3eea835/disk
 vdb      /dev/sdb
 vdc      /dev/sdc

However we only see a single WWN under /dev/disk/by-id/wwn*:

$ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 10 Mar  9 21:08 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../dm-5

Oddly I also see the devices being picked up as path devices because of the duplicate WWN?!

$ lsblk
[..]
sdb                                                                                   8:16   0    1G  0 disk  
└─mpatha                                                                            253:5    0    1G  0 mpath 
sdc                                                                                   8:32   0    1G  0 disk  
└─mpatha                                                                            253:5    0    1G  0 mpath

I didn't configure multipathd on this deployment but here it is configured and providing a mpath device:

$ sudo multipath -ll
mpatha (360000000000000000e00000000010001) dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 7:0:0:1 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 8:0:0:1 sdc 8:32 active ready running

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-03-09:

#4

Final update today, with mpath disabled I get the behaviour seen in c#0 when the encrypted volume test failed:

$ openstack --os-compute-api-version 2.latest server create --flavor 1 --image cirros-0.5.1-x86_64-disk --host devstack-focal-cpu --network private test-1
$ openstack --os-compute-api-version 2.latest server create --flavor 1 --image cirros-0.5.1-x86_64-disk --host devstack-focal-cpu --network private test-2
$ openstack volume create --size 1 test-1
$ openstack volume create --size 1 test-2
$ openstack server add volume test-1 test-1

stack@devstack-focal-cpu:~/devstack $ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 9 Mar 9 22:42 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../sdb

$ openstack server add volume test-2 test-2

stack@devstack-focal-cpu:~/devstack $ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 9 Mar 9 22:42 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../sdc

I can't however reproduce any of the detach failures seen in the nova-live-migration job.

After a brief exchange on the ML I've posted the following WIP to move Focal based CI envs over to lioadm to try to avoid this:

WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu
https://review.opendev.org/c/openstack/devstack/+/779624

Sofia Enriquez (lsofia-enriquez) on 2021-03-10

Changed in cinder:
status:	New → In Progress
assignee:	nobody → Lee Yarwood (lyarwood)
importance:	Undecided → High

Revision history for this message

Gorka Eguileor (gorka) wrote on 2021-03-23:

#5

I am no expert on STGT, since I always work with LIO, but from I could
gather this seems to be caused by the conjunction of us:

- Using the tgtadm helper
- Having 2 different cinder-volume services running on 2 different hosts
(one in compute and another on controller).
- Using the same volume_backend_name for both LVM backends.

If we were running a single cinder-volume service with 2 backends this
issue wouldn't happen (I checked).

If we used a different volume_backend_name for each of the 2 services
and used a volume type picking one of them for the operations, this
wouldn't happen either.

If we used LIO instead, this wouldn't happen.

The cause is the automatic generation of serial/wwn for volumes by the
STGT, that seems to be deterministic. First target created on a host
will be have a 60000000000000000e0000000001 prefix and then the LUN
number (the 3 before it that we see in the connection_info is just to
state that the WWN is of NAA type).

This means that the first volume exposed by STGT on any host will ALWAYS
have the same WWN and will mess things up if we attach them to the same
host, because the premise of a WWN is its uniqueness and everything in
Cinder and OS-Brick assumes this and will not be changed.

For LIO it seems that the generation of the seria/wwn is non
deterministic (or at least not the same on all hosts) so the issue won't
happen in this specific deployment configuration.

So the options to prevent this issue are to run both backends on the
controller node, use different volume_backend_name and a volume type, or
use LIO.

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-03-23:

#6

I'm not entirely sure how using a different volume_backend_name would help? As you say above the first target on both hosts would still have the 60000000000000000e0000000001 prefix regardless of the name right?

Moving to a single service multibackend approach would be best but given required job changes etc isn't something I think we can do in the short term.

Moving to lioadm is still my preferred short term solution to this with the following devstack change awaiting reviews below:

cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu
https://review.opendev.org/c/openstack/devstack/+/779624

Cinder

Running parallel iSCSI/LVM c-vol backends is causing random failures in CI

Bug Description

Other bug subscribers

Remote bug watches