Comment 18 for bug 1436999

Revision history for this message
Paul Halmos (paul-halmos) wrote :

I manually rolled the changes into an environment today on a single cinder node. The tenant had shutdown the instances with the volumes attached prior to the maintenance. After rebooting the cinder node, it was found that the state files located in /var/lib/cinder/volumes/ were nonexistent. This manifested itself when attempting to start an instance which had a volume on the effected cinder node:

2015-06-09 18:54:58.804 19459 TRACE oslo.messaging.rpc.dispatcher libvirtError: Failed to open file '/dev/disk/by-path/ip-10.17.150.69:3260-iscsi-iqn.2010-10.org.openstack:volume-ab377917-b1fa-416a-b001-ef6b9ff09715-lun-1': No such device or address

When attempting to discover the targets the following errors were seen:

# iscsiadm -m discovery -t st -p 10.17.150.69:3260
iscsiadm: Connection to Discovery Address 10.17.150.69 closed
iscsiadm: Login I/O error, failed to receive a PDU
iscsiadm: retrying discovery login to 10.17.150.69

2015-06-09 19:23:43.073 19459 TRACE oslo.messaging.rpc.dispatcher Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iscsiadm -m node -T iqn.2010-10.org.openstack:volume-ab377917-b1fa-416a-b001-ef6b9ff09715 -p 10.17.150.69:3260 --rescan
2015-06-09 19:23:43.073 19459 TRACE oslo.messaging.rpc.dispatcher Exit code: 21
2015-06-09 19:23:43.073 19459 TRACE oslo.messaging.rpc.dispatcher Stdout: u''
2015-06-09 19:23:43.073 19459 TRACE oslo.messaging.rpc.dispatcher Stderr: u'iscsiadm: No session found.\n'

On the cinder node:

015-06-09 18:29:18.882 1411 ERROR cinder.volume.manager [req-62b66a03-c879-482c-8be5-942e5b35180d - - - - -] Failed to re-export volume ab377917-b1fa-416a-b001-ef6b9ff09715: setting to error state
2015-06-09 18:29:18.883 1411 ERROR cinder.volume.manager [req-62b66a03-c879-482c-8be5-942e5b35180d - - - - -] Failed to create iscsi target for volume volume-ab377917-b1fa-416a-b001-ef6b9ff09715.
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager Traceback (most recent call last):
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager File "/usr/local/lib/python2.7/dist-packages/cinder/volume/manager.py", line 276, in init_host
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager self.driver.ensure_export(ctxt, volume)
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager return f(*args, **kwargs)
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/lvm.py", line 543, in ensure_export
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager self.configuration)
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager File "/usr/local/lib/python2.7/dist-packages/cinder/volume/iscsi.py", line 116, in ensure_export
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager write_cache=conf.iscsi_write_cache)
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager File "/usr/local/lib/python2.7/dist-packages/cinder/brick/iscsi/iscsi.py", line 249, in create_iscsi_target
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager raise exception.ISCSITargetCreateFailed(volume_id=vol_id)
2015-06-09 18:29:18.883 1411 TRACE cinder.volume.manager ISCSITargetCreateFailed: Failed to create iscsi target for volume volume-ab377917-b1fa-416a-b001-ef6b9ff09715.

The short term fix was to hack mysql and set attach_status=“detached”. I could then re-attach the volume to the instance. That action created the state file in /var/lib/cinder/volume/. The down side to this fix was it created duplicate volumes on the instance:

nova show:
….
os-extended-volumes:volumes_attached | [{"id": "bc92b4ee-a6c4-430b-98c1-dbbb2ae22a78"}, {"id": "7cd4b5da-8990-49d1-adf9-ea72c6a7b976"}, {"id": "ab377917-b1fa-416a-b001-ef6b9ff09715"}, {"id": "ab377917-b1fa-416a-b001-ef6b9ff09715"}, {"id": "633b5b59-8e6c-45b3-9245-fd3d530b015a"}, {"id": "633b5b59-8e6c-45b3-9245-fd3d530b015a"}]
….

This caused issues for the tenant’s instance as the RAID volumes failed to activate. Once I cleaned up the nova.block_device_mapping and set deleted=1 for the duplicate entries, I was able to get the instance to see the correct volumes. It is unclear why the state files were removed from /var/lib/cinder/volumes. In hind site, these can be re-created by hand from the data in “nova.block_device_mapping” or prior to rolling this change out, the instances should be shutdown and volumes detached. Post update the volumes can be re-attached.