Nova and Cinder get desynced on volume attachments

Bug #1499012 reported by Edmund Rhudy
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
High
Unassigned

Bug Description

This bug occurred with versions 2015.1.0 of Nova and Cinder installed.

When bulk deleting large numbers of Nova instances, we occasionally encounter something that appears to be a race condition where Cinder believes a volume is detached and available but Nova reports the volume is still attached to an instance.

Example below shows an instance in ERROR state (some output of nova show truncated for brevity) and Cinder showing that the volume Nova thinks is attached is available:

+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | --- |
| OS-EXT-SRV-ATTR:host | --- |
| OS-EXT-SRV-ATTR:hypervisor_hostname | --- |
| OS-EXT-SRV-ATTR:instance_name | instance-0000427f |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | 2015-09-22T15:39:06.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2015-09-22T14:04:40Z |
                                                                                                                                                                              |
| flavor | m1.small (2) |
| hostId | e2d6789d6505366a3d78097d4c0bf99b8f30dd49b5561b1f94e0cdcb |
| id | 247d9ebe-6794-4119-b677-d990fd6244b1 |
| image | Red Hat Enterprise Linux 7 (current) (297e1979-cfa0-4596-8446-95c29ac640cc) |
| key_name | --- |
| metadata | {} |
| name | r_img_test_z2_1 |
| os-extended-volumes:volumes_attached | [{"id": "7adab934-360f-4cd0-9c98-7d6bea1554b3"}] |
| security_groups | default |
| status | ERROR |
| tenant_id | a9421c18f6fc48fea53c5685654f625a |
| updated | 2015-09-23T13:51:10Z |
| user_id | --- |
+--------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

+--------------------------------------+----------------------------------+-----------+--------------+------+-------------+----------+-------------+
| ID | Tenant ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+----------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 7adab934-360f-4cd0-9c98-7d6bea1554b3 | a9421c18f6fc48fea53c5685654f625a | available | test2 | 20 | SSD | false | |
+--------------------------------------+----------------------------------+-----------+--------------+------+-------------+----------+-------------+

Attempting to delete the instance while in this state results in the following traceback from nova-client:

File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 343, in decorated_function
  return function(self, context, *args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2737, in terminate_instance
  do_terminate_instance(instance, bdms)
File \"/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py\", line 445, in inner
  return f(*args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2735, in do_terminate_instance
  self._set_instance_error_state(context, instance)
File \"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py\", line 85, in __exit__
  six.reraise(self.type_, self.value, self.tb)
File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2725, in do_terminate_instance
  self._delete_instance(context, instance, bdms, quotas)
File \"/usr/lib/python2.7/dist-packages/nova/hooks.py\", line 149, in inner
  rv = f(*args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2694, in _delete_instance
  quotas.rollback()
File \"/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py\", line 85, in __exit__
  six.reraise(self.type_, self.value, self.tb)
File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2664, in _delete_instance
  self._shutdown_instance(context, instance, bdms)
File \"/usr/lib/python2.7/dist-packages/nova/compute/manager.py\", line 2604, in _shutdown_instance
  self.volume_api.detach(context, bdm.volume_id)
File \"/usr/lib/python2.7/dist-packages/nova/volume/cinder.py\", line 214, in wrapper
  res = method(self, ctx, volume_id, *args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/nova/volume/cinder.py\", line 365, in detach
  cinderclient(context).volumes.detach(volume_id)
File \"/usr/lib/python2.7/dist-packages/cinderclient/v2/volumes.py\", line 334, in detach
  return self._action('os-detach', volume)
File \"/usr/lib/python2.7/dist-packages/cinderclient/v2/volumes.py\", line 311, in _action
  return self.api.client.post(url, body=body)
File \"/usr/lib/python2.7/dist-packages/cinderclient/client.py\", line 91, in post
  return self._cs_request(url, 'POST', **kwargs)
File \"/usr/lib/python2.7/dist-packages/cinderclient/client.py\", line 85, in _cs_request
  return self.request(url, method, **kwargs)
File \"/usr/lib/python2.7/dist-packages/cinderclient/client.py\", line 80, in request
  return super(SessionClient, self).request(*args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/keystoneclient/adapter.py\", line 200, in request
  resp = super(LegacyJsonAdapter, self).request(*args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/keystoneclient/adapter.py\", line 89, in request
  return self.session.request(url, method, **kwargs)
File \"/usr/lib/python2.7/dist-packages/keystoneclient/utils.py\", line 318, in inner
  return func(*args, **kwargs)
File \"/usr/lib/python2.7/dist-packages/keystoneclient/session.py\", line 389, in request
  raise exceptions.from_response(resp, method, url)

Cinder logs an exception stating that the volume has no attachments:

2015-09-23 09:49:47.923 42963 ERROR cinder.volume.manager [req-4a2f6878-54ec-45a9-abcf-f4439a6c2de8 cloud_ll2 e000f9d1f3bf4f758e1d77c94da61281 - - -] Volume 7adab934-360f-4cd0-9c98-7d6bea1554b3 doesn't have any attachments to detach
2015-09-23 09:49:47.924 42963 ERROR oslo_messaging.rpc.dispatcher [req-4a2f6878-54ec-45a9-abcf-f4439a6c2de8 cloud_ll2 e000f9d1f3bf4f758e1d77c94da61281 - - -] Exception during message handling: Invalid volume: Volume 7adab934-360f-4cd0-9c98-7d6bea1554b3 doesn't have any attachments to detach
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 157, in ldo_inner1
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher return ldo_inner2(inst, context, volume_id, attachment_id, **kwargs)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 445, in inner
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 156, in ldo_inner2
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher return f(*_args, **_kwargs)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 882, in detach_volume
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher raise exception.InvalidVolume(reason=msg)
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher InvalidVolume: Invalid volume: Volume 7adab934-360f-4cd0-9c98-7d6bea1554b3 doesn't have any attachments to detach
2015-09-23 09:49:47.924 42963 TRACE oslo_messaging.rpc.dispatcher

While in this state, Nova refuses to take any action on the instance, including force deletion. Removing the volume via cinder delete unsticks things; nova delete will remove the instance once the offending volume has been removed.

tags: added: compute volumes
Changed in nova:
assignee: nobody → gundarapu kalyan reddy (gundarapu-reddy)
Revision history for this message
Scott DAngelo (scott-dangelo) wrote :

Yes, there is a race in this code path. Nikola had expressed an interest in removing the volume state checks in Nova (i.e. check_attach, check_attached, check_detached) which would help eliminate this race. We'll also need some "try:" blocks in the Nova detach_volume code, and there is work in Cinder to use compare-and-swap on DB transactions to close the race on the Cinder side. There's work planned in the longer time frame for changes to Cinder API and how Nova consumes it that should help as well.

Changed in nova:
assignee: gundarapu kalyan reddy (gundarapu-reddy) → nobody
Changed in nova:
status: New → Confirmed
Revision history for this message
John Garbutt (johngarbutt) wrote :

Hopefully this will get resolved as part of the current Cinder<->Nova API rework.

As an aside, it would be nice to more information about the logs that cause the instance to go into the Error state, and info on if the volume is actually attached to the VM, or its just that Nova is out of sync with reality, etc.

Changed in nova:
importance: Undecided → High
Lee Yarwood (lyarwood)
Changed in nova:
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.