Running parallel iSCSI/LVM c-vol backends is causing random failures in CI

Bug #1917750 reported by Lee Yarwood on 2021-03-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
High
Lee Yarwood

Bug Description

$subject, we noticed this while debugging an encrypted volume test failure where the volume somehow became unencrypted just as QEMU attempted to attach it. Looking through the logs I noticed that there were two parallel requests to attach volumes at the time that were served by different c-vol backends.

The following for our encrypted volume from the 10.4.70.186:3260 portal:

https://040bad29f060e1f76339-88856c79572ad1783ad6e63321e57df6.ssl.cf2.rackcdn.com/761452/11/check/nova-next/fdaaa19/compute1/logs/screen-n-cpu.txt

43354 Mar 03 20:53:06.469770 ubuntu-focal-limestone-regionone-0023285105 nova-compute[53948]: DEBUG os_brick.initiator.connectors.iscsi [None req-66e30ff7-e88c-41ea-a6a4-46c71c9e31ed tempest-TestEncryptedCinderVolumes-409494569 tempest-TestEncryptedCinderVolumes-409494569-project] ==> connect_volume: call "{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fae3034b520>, {'target_discovered': False, 'target_portal': '10.4.70.186:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-bf8c3d96-627d-40e1-ab47-56ed93fdcd3e', 'target_lun': 1, 'volume_id': 'bf8c3d96-627d-40e1-ab47-56ed93fdcd3e', 'auth_method': 'CHAP', 'auth_username': 'njEARKnLNcLdjg9qL3di', 'auth_password': '***', 'encrypted': True, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False}), 'kwargs': {}}" {{(pid=53948) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:147}}
[..]
43387 Mar 03 20:53:07.637853 ubuntu-focal-limestone-regionone-0023285105 nova-compute[53948]: DEBUG nova.virt.libvirt.volume.iscsi [None req-66e30ff7-e88c-41ea-a6a4-46c71c9e31ed tempest-TestEncryptedCinderVolumes-409494569 tempest-TestEncryptedCinderVolumes-409494569-project] Attached iSCSI volume {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/disk/by-id/scsi-360000000000000000e00000000010001'} {{(pid=53948) connect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:65}}

Then another for another test from the 10.4.70.243:3260 portal:

https://040bad29f060e1f76339-88856c79572ad1783ad6e63321e57df6.ssl.cf2.rackcdn.com/761452/11/check/nova-next/fdaaa19/compute1/logs/screen-n-cpu.txt

43418 Mar 03 20:53:11.790157 ubuntu-focal-limestone-regionone-0023285105 nova-compute[53948]: DEBUG os_brick.initiator.connectors.iscsi [None req-1d89a295-e88b-4e20-92f4-a869856e21a3 tempest-TestStampPattern-2018003410 tempest-TestStampPattern-2018003410-project] ==> connect_volume: call "{'args': (<os_brick.initiator.connectors.iscsi.ISCSIConnector object at 0x7fae3034b520>, {'target_discovered': False, 'target_portal': '10.4.70.243:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-afe9efc1-3914-40d8-854d-cfbcf3603188', 'target_lun': 1, 'volume_id': 'afe9efc1-3914-40d8-854d-cfbcf3603188', 'auth_method': 'CHAP', 'auth_username': 'AzBZtUgYDZ6bRCuHzFRQ', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False}), 'kwargs': {}}" {{(pid=53948) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:147}}
[..]
43489 Mar 03 20:53:12.998566 ubuntu-focal-limestone-regionone-0023285105 nova-compute[53948]: DEBUG nova.virt.libvirt.volume.iscsi [None req-1d89a295-e88b-4e20-92f4-a869856e21a3 tempest-TestStampPattern-2018003410 tempest-TestStampPattern-2018003410-project] Attached iSCSI volume {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sdb'} {{(pid=53948) connect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:65}}

Note that each results in the same 360000000000000000e00000000010001 scsi_wwn as these are independent iSCSI/LVM c-vol backends providing each volume even if the /dev/sd paths are different.

I've never fully understood why we run c-vol on the computes tbh but unless they can coordinate or avoid duplicating WWNs then I guess we can't run them in parallel?

Lee Yarwood (lyarwood) wrote :
Download full text (9.1 KiB)

Another data point, this time in the nova-live-migration job. Again I see two volumes attached from different backends with the same WWN, this time os-brick returns different sd devices and not the device-mapper links:

https://zuul.opendev.org/t/openstack/build/fb643b53835341ac8589afeadfa7044d/log/compute1/logs/screen-n-cpu.txt

Mar 08 22:29:56.730776 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] Connected to sda using {'target_discovered': False, 'target_portal': '10.208.224.42:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-3d034a25-2301-44f7-b10a-7e0a46402c29', 'target_lun': 1, 'volume_id': '3d034a25-2301-44f7-b10a-7e0a46402c29', 'auth_method': 'CHAP', 'auth_username': 'V2L87oQWgpAL7wPMRFng', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False} {{(pid=53492) _connect_vol /usr/local/lib/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:675}}
Mar 08 22:29:56.732310 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] <== connect_volume: return (1163ms) {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sda'} {{(pid=53492) trace_logging_wrapper /usr/local/lib/python3.8/dist-packages/os_brick/utils.py:171}}
Mar 08 22:29:56.732775 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG nova.virt.libvirt.volume.iscsi [None req-a7c2d69f-2b26-46a5-9588-0500ed40ecf2 tempest-LiveMigrationTest-303521084 tempest-LiveMigrationTest-303521084-project] Attached iSCSI volume {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path': '/dev/sda'} {{(pid=53492) connect_volume /opt/stack/nova/nova/virt/libvirt/volume/iscsi.py:65}}

Mar 08 22:30:02.484937 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] Connected to sdb using {'target_discovered': False, 'target_portal': '10.208.224.30:3260', 'target_iqn': 'iqn.2010-10.org.openstack:volume-2384d911-102d-4b91-bd63-4c73baabd6b6', 'target_lun': 1, 'volume_id': '2384d911-102d-4b91-bd63-4c73baabd6b6', 'auth_method': 'CHAP', 'auth_username': 'rpEQU9A9uBRqDkLWVwkL', 'auth_password': '***', 'encrypted': False, 'qos_specs': None, 'access_mode': 'rw', 'cacheable': False} {{(pid=53492) _connect_vol /usr/local/lib/python3.8/dist-packages/os_brick/initiator/connectors/iscsi.py:675}}
Mar 08 22:30:02.487079 ubuntu-focal-rax-dfw-0023379061 nova-compute[53492]: DEBUG os_brick.initiator.connectors.iscsi [None req-2ade7ce2-8fca-4d71-8c30-496fa8e8fce4 tempest-LiveAutoBlockMigrationV225Test-766901728 tempest-LiveAutoBlockMigrationV225Test-766901728-project-admin] <== connect_volume: return (1164ms) {'type': 'block', 'scsi_wwn': '360000000000000000e00000000010001', 'path...

Read more...

Lee Yarwood (lyarwood) wrote :

Fun so this appears to be the result of the tgtd backend still being used by Ubuntu based CI hosts. With a Fedora 32 based multinode env I've just deployed using lio-adm I see different WWNs for each volume from a different LVM/iSCSI c-vol backend:

$ ll /dev/disk/by-id/wwn-0x6001405*
lrwxrwxrwx. 1 root root 9 Mar 9 13:31 /dev/disk/by-id/wwn-0x600140525061514ba074fd880aac7d00 -> ../../sdb
lrwxrwxrwx. 1 root root 9 Mar 9 13:32 /dev/disk/by-id/wwn-0x60014056e6d98ce391f402ea4e6f4e29 -> ../../sdc

I'll rebuild with Focal now and confirm the behaviour with tgtd.

Lee Yarwood (lyarwood) wrote :

I've reproduced the single WWN part on a Focal based multinode env, I've not yet reproduced the odd detach behaviour seen in the nova-live-migration job.

$ openstack server create --flavor 1 --image cirros-0.5.1-x86_64-disk --network private test

Two volumes are created in the two different backends:

$ openstack volume create --size 1 test-1
$ openstack volume create --size 1 test-2

stack@devstack-focal-ctrl:~/devstack$ sudo lvs | grep volume-
  volume-12e58e2e-41cb-46dd-b294-4feb836f9434 stack-volumes-lvmdriver-1 Vwi-aotz-- 1.00g stack-volumes-lvmdriver-1-pool 0.00

stack@devstack-focal-cpu:~/devstack$ sudo lvs | grep volume-
  volume-3e72097c-6281-4cbb-b676-337eb7e267d1 stack-volumes-lvmdriver-1 Vwi-aotz-- 1.00g stack-volumes-lvmdriver-1-pool 0.00

Attached to a test instance:

$ openstack server add volume test test-1
$ openstack server add volume test test-2

os-brick has told n-cpu to attach /dev/sd{b,c}:

$ sudo virsh domblklist 5f28ff81-93d6-48e9-bdca-7b67d3eea835
 Target Source
------------------------------------------------------------------------------------
 vda /opt/stack/data/nova/instances/5f28ff81-93d6-48e9-bdca-7b67d3eea835/disk
 vdb /dev/sdb
 vdc /dev/sdc

However we only see a single WWN under /dev/disk/by-id/wwn*:

$ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 10 Mar 9 21:08 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../dm-5

Oddly I also see the devices being picked up as path devices because of the duplicate WWN?!

$ lsblk
[..]
sdb 8:16 0 1G 0 disk
└─mpatha 253:5 0 1G 0 mpath
sdc 8:32 0 1G 0 disk
└─mpatha 253:5 0 1G 0 mpath

I didn't configure multipathd on this deployment but here it is configured and providing a mpath device:

$ sudo multipath -ll
mpatha (360000000000000000e00000000010001) dm-5 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 7:0:0:1 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 8:0:0:1 sdc 8:32 active ready running

Lee Yarwood (lyarwood) wrote :

Final update today, with mpath disabled I get the behaviour seen in c#0 when the encrypted volume test failed:

$ openstack --os-compute-api-version 2.latest server create --flavor 1 --image cirros-0.5.1-x86_64-disk --host devstack-focal-cpu --network private test-1
$ openstack --os-compute-api-version 2.latest server create --flavor 1 --image cirros-0.5.1-x86_64-disk --host devstack-focal-cpu --network private test-2
$ openstack volume create --size 1 test-1
$ openstack volume create --size 1 test-2
$ openstack server add volume test-1 test-1

stack@devstack-focal-cpu:~/devstack $ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 9 Mar 9 22:42 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../sdb

$ openstack server add volume test-2 test-2

stack@devstack-focal-cpu:~/devstack $ ll /dev/disk/by-id/wwn-*
lrwxrwxrwx 1 root root 9 Mar 9 22:42 /dev/disk/by-id/wwn-0x60000000000000000e00000000010001 -> ../../sdc

I can't however reproduce any of the detach failures seen in the nova-live-migration job.

After a brief exchange on the ML I've posted the following WIP to move Focal based CI envs over to lioadm to try to avoid this:

WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu
https://review.opendev.org/c/openstack/devstack/+/779624

Changed in cinder:
status: New → In Progress
assignee: nobody → Lee Yarwood (lyarwood)
importance: Undecided → High
Gorka Eguileor (gorka) wrote :

I am no expert on STGT, since I always work with LIO, but from I could
gather this seems to be caused by the conjunction of us:

- Using the tgtadm helper
- Having 2 different cinder-volume services running on 2 different hosts
  (one in compute and another on controller).
- Using the same volume_backend_name for both LVM backends.

If we were running a single cinder-volume service with 2 backends this
issue wouldn't happen (I checked).

If we used a different volume_backend_name for each of the 2 services
and used a volume type picking one of them for the operations, this
wouldn't happen either.

If we used LIO instead, this wouldn't happen.

The cause is the automatic generation of serial/wwn for volumes by the
STGT, that seems to be deterministic. First target created on a host
will be have a 60000000000000000e0000000001 prefix and then the LUN
number (the 3 before it that we see in the connection_info is just to
state that the WWN is of NAA type).

This means that the first volume exposed by STGT on any host will ALWAYS
have the same WWN and will mess things up if we attach them to the same
host, because the premise of a WWN is its uniqueness and everything in
Cinder and OS-Brick assumes this and will not be changed.

For LIO it seems that the generation of the seria/wwn is non
deterministic (or at least not the same on all hosts) so the issue won't
happen in this specific deployment configuration.

So the options to prevent this issue are to run both backends on the
controller node, use different volume_backend_name and a volume type, or
use LIO.

Lee Yarwood (lyarwood) wrote :

I'm not entirely sure how using a different volume_backend_name would help? As you say above the first target on both hosts would still have the 60000000000000000e0000000001 prefix regardless of the name right?

Moving to a single service multibackend approach would be best but given required job changes etc isn't something I think we can do in the short term.

Moving to lioadm is still my preferred short term solution to this with the following devstack change awaiting reviews below:

cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu
https://review.opendev.org/c/openstack/devstack/+/779624

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers