Nova and Brick can log each other out of iscsi sessions

Bug #1429684 reported by Alex Meade
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
High
Zhang Hao
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Brick and nova are not synchronized with the same connect_volume lock. This can cause nova or cinder to logout of an iscsi portal when the other one is attempting to use it. if nova and cinder are on the same node.

This may seem like a rare situation but commonly occurs in our CI system as we perform many operations involving both Nova and Brick concurrently. Likely when attaching/detaching to an instance while attaching to the node directly for image operations.

In the below case, cinder logged out of the iscsi session while nova was retrying rescans attempting to detect the new LUN.

Cinder volume logs:

2015-03-07 17:27:14.288 28940 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --logout execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:199

2015-03-07 17:27:14.875 28940 DEBUG oslo_concurrency.processutils [-] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --logout" returned: 0 in 0.588s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:225

2015-03-07 17:27:14.876 28940 DEBUG cinder.brick.initiator.connector [-] iscsiadm ('--logout',): stdout=Logging out of session [sid: 1, target: iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815, portal: 10.250.119.127,3260]

Logout of [sid: 1, target: iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815, portal: 10.250.119.127,3260] successful.

Nova compute logs:

2015-03-07 17:27:12.617 DEBUG nova.virt.libvirt.volume [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] iscsiadm ('--rescan',): stdout=Rescanning session [sid: 1, target: iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815, portal: 10.250.119.127,3260]

 stderr= _run_iscsiadm /opt/stack/new/nova/nova/virt/libvirt/volume.py:364

2015-03-07 17:27:12.617 WARNING nova.virt.libvirt.volume [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] ISCSI volume not yet found at: vdb. Will rescan & retry. Try number: 0

2015-03-07 17:27:12.618 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:199

2015-03-07 17:27:13.503 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan" returned: 0 in 0.885s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:225

2015-03-07 17:27:13.504 DEBUG nova.virt.libvirt.volume [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] iscsiadm ('--rescan',): stdout=Rescanning session [sid: 1, target: iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815, portal: 10.250.119.127,3260]

 stderr= _run_iscsiadm /opt/stack/new/nova/nova/virt/libvirt/volume.py:364

2015-03-07 17:27:14.504 WARNING nova.virt.libvirt.volume [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] ISCSI volume not yet found at: vdb. Will rescan & retry. Try number: 1

2015-03-07 17:27:14.505 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:199

2015-03-07 17:27:14.563 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan" returned: 0 in 0.058s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:225

2015-03-07 17:27:14.564 DEBUG nova.virt.libvirt.volume [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] iscsiadm ('--rescan',): stdout=Rescanning session [sid: 1, target: iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815, portal: 10.250.119.127,3260]

 stderr= _run_iscsiadm /opt/stack/new/nova/nova/virt/libvirt/volume.py:364

2015-03-07 17:27:18.564 WARNING nova.virt.libvirt.volume [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] ISCSI volume not yet found at: vdb. Will rescan & retry. Try number: 2

2015-03-07 17:27:18.565 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] Running cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:199

2015-03-07 17:27:18.626 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan" returned: 21 in 0.061s execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:225

2015-03-07 17:27:18.627 DEBUG oslo_concurrency.processutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] u'sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan' failed. Not Retrying. execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:258

2015-03-07 17:27:18.627 DEBUG oslo_concurrency.lockutils [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] Lock "connect_volume" released by "connect_volume" :: held 6.588s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:442

2015-03-07 17:27:18.627 ERROR nova.virt.block_device [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Driver failed to attach volume 673c93f7-97c0-44da-bed2-2b7e4106ac2d at /dev/vdb

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Traceback (most recent call last):

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/block_device.py", line 249, in attach

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] device_type=self['device_type'], encryption=encryption)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1036, in attach_volume

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] self._connect_volume(connection_info, disk_info)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 987, in _connect_volume

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] driver.connect_volume(connection_info, disk_info)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 431, in inner

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] return f(*args, **kwargs)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/volume.py", line 457, in connect_volume

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] self._run_iscsiadm(iscsi_properties, ("--rescan",))

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/volume.py", line 359, in _run_iscsiadm

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] check_exit_code=check_exit_code)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/utils.py", line 205, in execute

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] return processutils.execute(*cmd, **kwargs)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 233, in execute

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] cmd=sanitized_cmd)

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] ProcessExecutionError: Unexpected error while running command.

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Exit code: 21

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Stdout: u''

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Stderr: u'iscsiadm: No session found.\n'

2015-03-07 17:27:18.627 28765 TRACE nova.virt.block_device [instance: 10476e16-0f9b-45b3-b759-3718c48150e2]

2015-03-07 17:27:18.631 DEBUG keystoneclient.session [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] REQ: curl -g -i -X POST http://127.0.0.1:8776/v2/b275c406f15c4e3c88b9cc2f201430a7/volumes/673c93f7-97c0-44da-bed2-2b7e4106ac2d/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}62885ded2920dbd143477d615394a9ff6e32ad6e" -d '{"os-terminate_connection": {"connector": {"ip": "10.250.121.14", "host": "slave-52122-devstack-trusty-cdot-iscsi-rax-iad", "initiator": "iqn.1993-08.org.debian:01:e8a2f229da"}}}' _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneclient/session.py:195

2015-03-07 17:27:18.632 DEBUG urllib3.util.retry [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0) from_int /usr/local/lib/python2.7/dist-packages/urllib3/util/retry.py:155

2015-03-07 17:27:19.109 DEBUG keystoneclient.session [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] RESP: [202] date: Sat, 07 Mar 2015 17:27:19 GMT content-length: 0 content-type: text/html; charset=UTF-8 x-openstack-request-id: req-e5cef5b1-b0b9-47af-a9fd-e462be8862f0 _http_log_response /usr/local/lib/python2.7/dist-packages/keystoneclient/session.py:223

2015-03-07 17:27:19.110 ERROR nova.compute.manager [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Failed to attach 673c93f7-97c0-44da-bed2-2b7e4106ac2d at /dev/vdb

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Traceback (most recent call last):

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/compute/manager.py", line 4654, in _attach_volume

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] do_check_attach=False, do_driver_attach=True)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/block_device.py", line 48, in wrapped

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] ret_val = method(obj, context, *args, **kwargs)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/block_device.py", line 258, in attach

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] connector)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] six.reraise(self.type_, self.value, self.tb)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/block_device.py", line 249, in attach

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] device_type=self['device_type'], encryption=encryption)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1036, in attach_volume

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] self._connect_volume(connection_info, disk_info)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 987, in _connect_volume

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] driver.connect_volume(connection_info, disk_info)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 431, in inner

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] return f(*args, **kwargs)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/volume.py", line 457, in connect_volume

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] self._run_iscsiadm(iscsi_properties, ("--rescan",))

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/virt/libvirt/volume.py", line 359, in _run_iscsiadm

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] check_exit_code=check_exit_code)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/opt/stack/new/nova/nova/utils.py", line 205, in execute

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] return processutils.execute(*cmd, **kwargs)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 233, in execute

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] cmd=sanitized_cmd)

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] ProcessExecutionError: Unexpected error while running command.

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.1992-08.com.netapp:sn.60a8eb0cc4bb11e4b041123478563412:vs.40815 -p 10.250.119.127:3260 --rescan

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Exit code: 21

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Stdout: u''

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2] Stderr: u'iscsiadm: No session found.\n'

2015-03-07 17:27:19.110 28765 TRACE nova.compute.manager [instance: 10476e16-0f9b-45b3-b759-3718c48150e2]

2015-03-07 17:27:19.111 DEBUG keystoneclient.session [req-55f33c70-ec85-4041-aaf6-205f74abf979 VolumesV1SnapshotTestJSON-1966398854 VolumesV1SnapshotTestJSON-1188982339] REQ: curl -g -i -X POST http://127.0.0.1:8776/v2/b275c406f15c4e3c88b9cc2f201430a7/volumes/673c93f7-97c0-44da-bed2-2b7e4106ac2d/action -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}62885ded2920dbd143477d615394a9ff6e32ad6e" -d '{"os-unreserve": null}' _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneclient/session.py:195

Revision history for this message
Alex Meade (alex-meade) wrote :

This is something we should keep in mind when having other projects use Brick. If brick works only as a library then there will need to be cross service locking on the Brick operations.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Nova does not use brick at the moment, right? What needs to change in Nova?

Changed in cinder:
status: New → Incomplete
Changed in cinder:
status: Incomplete → New
Changed in nova:
status: New → Incomplete
Revision history for this message
Alex Meade (alex-meade) wrote :

Correct, Nova does not currently use Brick.

The code in:
cinder/brick/initiator/connector.py
and
./nova/virt/libvirt/volume.py
are very similar.

Both connect_volume and disconnect_volume use 'utils.synchronized('connect_volume')' which will only lock within that process and not across both Cinder and Nova. One example of why this is an issue is the retries looking for a LUN here:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume.py#L446

can be sabotaged if

https://github.com/openstack/cinder/blob/master/cinder/brick/initiator/connector.py#L399

is called within the retry window.

I actually think a way to mitigate this issue would be to ensure we are connected to the iscsi portal within the retry loop. As opposed to a guaranteed fix using file locks. I will test it out in our CI system and try to get a patch up soon. Hope this makes sense.

Alex Meade (alex-meade)
Changed in nova:
status: Incomplete → New
Revision history for this message
Jay Bryant (jsbryant) wrote :

If this is effecting ci we should try to fix it soon. Thanks for reporting this.

Changed in cinder:
importance: Undecided → Medium
importance: Medium → High
Revision history for this message
John Griffith (john-griffith) wrote :

@Alex,
I'm sorry I'm not quite following this. So given the fact that we have unique targets for each volume, how do we get in to a situation where we "clobber" the specific iqn from the Cinder side?

In other words, the initiator code in Cinder is only for things like image-update/download etc; volume shouldn't be eligible for attach calls from Nova during that state, and we can't do the local operations in Cinder when the volume is in-use.

Is this a NetApp specific thing with how you're mapping targets to host files?

Revision history for this message
Sean Dague (sdague) wrote :

Does this happen on any volume backend? (or is it limitted to a particular one)?

As you are running a CI, I think we should up the criteria here for moving out of incomplete and actually get a reproduce scenario that we could run on a devstack node for this.

Changed in nova:
status: New → Incomplete
Revision history for this message
Walt Boring (walter-boring) wrote :

The local file locks in Nova are hard coded with a prefix of 'nova-'

The local file locks in cinder/brick are hard coded with a prefix of 'brick-'

We are going to move nova and cinder to both using the standalone os-brick library in Liberty.

If your CI is failing because of this issue, you can always patch your local cinder/brick/initiator/connector.py to change the prefix to 'nova-' as well? It's hackish, but then both nova and cinder/brick would share the same nova-connect_volume local file locks.

We could patch cinder's cinder/brick to use 'nova-' as a prefix as well, if this is a problem for everyone for Kilo.

Changed in cinder:
status: New → Triaged
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Sean Dague (sdague) wrote in comment #6:
> Does this happen on any volume backend?
IIUC what Walt Boring (walter-boring) said in comment #7 it is independently from the backend. So I think this is a confirmed bug.

@Walt Boring (walter-boring):
You started a WIP in Nova to use os-brick: https://review.openstack.org/#/c/175569/
Do you think that this will solve the issue when it is merged?

Changed in nova:
status: Incomplete → Confirmed
Revision history for this message
John Griffith (john-griffith) wrote :

@Walt and @Alex,

I'm still looking for an answer to my question; regardless of lock prefix etc, that should be irrelevant; The reason being is that at least as of right now we don't allow/support multiple attachments at the same time and this is supposed to be enforced from the Cinder-DB side of things.

For example,if the volume is attached to a Nova node, the status should be set to "in-use" on the Cinder side (and if it's attached on the Cinder side as well). My point/question here is that why are we in a situation where multiple initiators are trying to connect and screw around with locks anyway? That seems like the root of the problem to me.

Also, FWIW; hacking around the local copy of the CI doesn't seem like a very good suggestion IMHO. If there's an issue in the design of brick-initiator that doesn't allow us to run in parallel until if/when you move things into Nova then we have a bigger problem.

FWIW, in any case I much prefer the suggestion by Alex to ensure the connection rather than deal with locks.

Revision history for this message
Alex Meade (alex-meade) wrote :

@John, this only occurs when Cinder and Nova are the SAME node, so they are the same initiator. Cinder volumes are just different LUNs on the same target for a lot of backends so that gives us two volumes with the same target.

This should be resolved now that Nova is using Brick and therefore both services will use the same external lock.

Revision history for this message
Walt Boring (walter-boring) wrote :

Can you re verify this bug? Both Nova and Cinder are using the external os-brick library, which should lock on the same lock name. I believe this has been fixed.

Zhang Hao (zhang-hao16)
Changed in cinder:
assignee: nobody → Zhang Hao (zhang-hao16)
Revision history for this message
Zhang Hao (zhang-hao16) wrote :

@Alex Meade
We also met this bug, and made code changes, through a lot of environmental tests, did not find problem up to now.
We began to consider is to use the file lock, too. But considering the concurrent performance and interaction, we finally didn't use this way. Instead, we use the method of a flexible, details are as follows:

The current process, when terminate volume, usually directly delete the local device, if this is to remove the last device, will no longer delete but just logout the session. We changed to no longer distinguish between processing these two scenarios, but unified delete the local device, not logout, preserve the iscsi session. In the next connect volume, will check the session exists or not, we add the judgment if there has been a session, no longer to login again, just direct to use "iscsiadm --rescan" to refresh and generates local device.

Through in a number of tests, this bug can be fixed, and both control node and compute nodes which keep the iscsi session, run stably.
I will submit code review soon, please check this way is feasible or not, thanks!

Revision history for this message
John Garbutt (johngarbutt) wrote :

So Nova is now using brick. I presume brick chooses the correct file name for the lock file now.

Can this be reproduced on master?

Changed in nova:
status: Confirmed → Incomplete
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

I don't believe this is an issue anymore, other than the caveat discussed elsewhere that nova and cinder need to use the same lock_dir if run on the same host. Please feel free to reopen if this is not the case.

Changed in cinder:
status: Triaged → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.