OpenStack Compute (nova)

Bug #1732199
Comment #7

Comment 7 for bug 1732199

Revision history for this message

Gorka Eguileor (gorka) wrote on 2018-09-24:

It looks like when the extend happens, the target has "problems", which results in an iSCSI connection failure [1]:

Sep 24 10:55:24 ubuntu-xenial-rax-ord-0002235107 kernel: connection21:0: detected conn error (1020)

Which in turn results in the iscsiadm tool returning an error when being called [2]:

Sep 24 10:55:26.759691 ubuntu-xenial-rax-ord-0002235107 nova-compute[15157]: WARNING os_brick.initiator.connectors.iscsi [req-eb1112fc-2010-4878-b6ed-7bd6d5c65b82 req-118f3f55-90c0-47b4-98c9-9b8069625dce service nova] Couldn't find iscsi sessions because iscsiadm err: iscsiadm: could not read session targetname: 5
Sep 24 10:55:26.760040 ubuntu-xenial-rax-ord-0002235107 nova-compute[15157]: iscsiadm: could not find session info for session22

But just a couple of seconds later the "iscsiadm -m session" command is run successfully [3] for another test.

And the connection for the first device seems to get fixed, and the size change detected [4]:

  Sep 24 10:55:28 ubuntu-xenial-rax-ord-0002235107 kernel: sd 22:0:0:1: [sda] 4194304 512-byte logical blocks: (2.15 GB/2.00 GiB)
  Sep 24 10:55:28 ubuntu-xenial-rax-ord-0002235107 kernel: sda: detected capacity change from 1073741824 to 2147483648
  Sep 24 10:55:28 ubuntu-xenial-rax-ord-0002235107 kernel: sd 22:0:0:1: [sda] Synchronizing SCSI cache

In summary, it looks like the extend is the cause of the issue, as it creates connection errors that then make the iscsiadm command fail.

This can probably be fixed in os-brick by adding a simple retry decorator with exponential backoff on the extend_volume method.

But if extending a volume breaks "iscsiadm -m session", that means that the extend probably needs to be a mutually exclusive operation, or we need a new way of retrieving that information without relying on iscsiadm.

[1]: http://logs.openstack.org/69/595069/2/gate/tempest-full/169db43/controller/logs/syslog.txt.gz#_Sep_24_10_55_24
[2]: http://logs.openstack.org/69/595069/2/gate/tempest-full/169db43/controller/logs/screen-n-cpu.txt.gz#_Sep_24_10_55_26_756780
[3]: http://logs.openstack.org/69/595069/2/gate/tempest-full/169db43/controller/logs/screen-n-cpu.txt.gz#_Sep_24_10_55_27_881039
[4]: http://logs.openstack.org/69/595069/2/gate/tempest-full/169db43/controller/logs/syslog.txt.gz#_Sep_24_10_55_28

It looks like when the extend happens, the target has "problems", which results in an iSCSI connection failure [1]:

Sep 24 10:55:24 ubuntu-xenial-rax-ord-0002235107 kernel:  connection21:0: detected conn error (1020)

Which in turn results in the iscsiadm tool returning an error when being called [2]:

Sep 24 10:55:26.759691 ubuntu-xenial-rax-ord-0002235107 nova-compute[15157]: WARNING os_brick.initiator.connectors.iscsi [req-eb1112fc-2010-4878-b6ed-7bd6d5c65b82 req-118f3f55-90c0-47b4-98c9-9b8069625dce service nova] Couldn't find iscsi sessions because iscsiadm err: iscsiadm: could not read session targetname: 5
  Sep 24 10:55:26.760040 ubuntu-xenial-rax-ord-0002235107 nova-compute[15157]: iscsiadm: could not find session info for session22

But just a couple of seconds later the "iscsiadm -m session" command is run successfully [3] for another test.

And the connection for the first device seems to get fixed, and the size change detected [4]:

In summary, it looks like the extend is the cause of the issue, as it creates connection errors that then make the iscsiadm command fail.

This can probably be fixed in os-brick by adding a simple retry decorator with exponential backoff on the extend_volume method.