OpenStack Compute (nova)

Volume detach fails if there are multiple BDM entries

Bug #1709287 reported by Radoslav Gerganov on 2017-08-08

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned

Bug Description

Steps to reproduce:
1. Attaching volume to an instance fails because of an RPC timeout when nova-api calls nova-compute to create BDM
2. Attaching the same volume to the same instance succeeds the second time
3. There are two BDMs for this volume and one of them has empty connection_info. When we try to detach the volume, an error is thrown because of the stale BDM entry created on step 1:

[req-b14eb2a2-10bc-4b1a-b62f-ead07947eb66 7c0126911c154f3db23e4f013c70f5aa b006cefe78734655ad29cf49445f2f67 - - -] Exception during message handling: <type 'NoneType'> can't be decoded
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
    incoming.message))
  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 185, in _dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)
  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
    result = func(ctxt, **new_args)
  File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 154, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 110, in wrapped
    payload)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 89, in wrapped
    return f(self, context, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 395, in decorated_function
    kwargs['instance'], e, sys.exc_info())
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 383, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 466, in decorated_function
    instance=instance)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 456, in decorated_function
    *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4976, in detach_volume
    attachment_id=attachment_id)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4906, in _detach_volume
    connection_info = jsonutils.loads(bdm.connection_info)
  File "/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 229, in loads
    return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 39, in safe_decode
    raise TypeError("%s can't be decoded" % type(text))
TypeError: <type 'NoneType'> can't be decoded

It is not easy to catch the timeout and then delete the BDM entry because the entry may get created later after the timeout (we have seen this in our environment). Also, we may accidentally delete the entry created by a concurrent attach request.