_local_delete results inconsistent volume state in DB

Bug #1415778 reported by Bin Zhou
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

    when nova-compute service is down, delete instance will call _local_delete in nova-api service, which will delete instance from DB,ternminate connection,detach volume and destroy bdm.
    However,we set connector = {'ip': '127.0.0.1', 'initiator': 'iqn.fake'} while call ternminate connection, which result an exception, leading the volume status still "in used", attached to the instance, but the instance and bdm are deleted in nova db. all of this make DB inconsistent state, bdm is deleted in nova, but volume still in use from cinder.
    Because the nova compute service is down, we can't get the correct connector of host. If we record the connector in bdm while attaching volume, the connector can be get from bdm when local_delete, which will lead success of ,ternminate connection,detach volume and so on.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Which version of OpenStack? Are there any stack traces?

Changed in nova:
status: New → Incomplete
Revision history for this message
Bin Zhou (binzhou) wrote :
Download full text (3.5 KiB)

version: OpenStack Compute (nova) 2014.1 "icehouse"
stack traces:
2015-02-02 09:33:10.936 10739 ERROR oslo.messaging.rpc.dispatcher [req-33c22474-43a2-4d27-815c-ef6432ccfe82 4ae3fd1b211349498fbe3aaed423d653 71f90f295e8d40a4b1638049ce07c697 - - -] Exception during message handling: Bad or unexpected response from the storage volume backend API: Unable to terminate volume connection: _map_lun:get map group info fail.group name:HostGroup_R8949317243996453135 with ret code: 16916997
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 999, in terminate_connection
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher raise exception.VolumeBackendAPIException(data=err_msg)
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Unable to terminate volume connection: _map_lun:get map group info fail.group name:HostGroup_R8949317243996453135 with ret code: 16916997
2015-02-02 09:33:10.936 10739 TRACE oslo.messaging.rpc.dispatcher
2015-02-02 09:33:10.937 10739 ERROR oslo.messaging._drivers.common [req-33c22474-43a2-4d27-815c-ef6432ccfe82 4ae3fd1b211349498fbe3aaed423d653 71f90f295e8d40a4b1638049ce07c697 - - -] Returning exception Bad or unexpected response from the storage volume backend API: Unable to terminate volume connection: _map_lun:get map group info fail.group name:HostGroup_R8949317243996453135 with ret code: 16916997 to caller
2015-02-02 09:33:10.937 10739 ERROR oslo.messaging._drivers.common [req-33c22474-43a2-4d27-815c-ef6432ccfe82 4ae3fd1b211349498fbe3aaed423d653 71f90f295e8d40a4b1638049ce07c697 - - -] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply\n incoming.message))\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/...

Read more...

Revision history for this message
Bin Zhou (binzhou) wrote :

This bug also appeared in Openstack juno and kilo version.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Lee Yarwood (lyarwood)
Changed in nova:
status: Expired → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.openstack.org/340951
Reason: https://review.openstack.org/#/c/257853/

Changed in nova:
assignee: nobody → prameela kapuganti (prameela)
Revision history for this message
prameela kapuganti (prameela) wrote :

I request the bug reporter to close this bug as it was fixed in Mitaka version and below is the detailed analysis and the delta between the Kilo and Mitaka version related to the bug.

My Analysis:

1) Existing Code snippet in Kilo:

IN _local_delete method: (/opt/stack/nova/nova/compute/api.py)

connector = {'ip': '127.0.0.1', 'initiator': 'iqn.fake'}
                try:
                    self.volume_api.terminate_connection(context,bdm.volume_id,connector)
                    self.volume_api.detach(elevated, bdm.volume_id)

#Here, we are passing the connector ip as 127.0.0.1 to the terminate_connection and detach methods, as this is a loop back ip, it has no effect on the server side. So, the volume was not being detached from the instance and so status is remaining the same(which is not expected).

2) Already fixed in Mitaka in the follwoing way:

#A new method was written here to get the connector ip and the fake ip(127.0.0.1) given was removed here.

IN _get_stashed_volume_connector method: (/opt/stack/nova/nova/compute/api.py)
connector = jsonutils.loads(bdm.connection_info).get('connector')
            if connector:
                if connector.get('host') == instance.host:
                    return connector

# Gets the stashed connector dict out of the bdm.connection_info if set and the connector host matches the instance host. This method was called in the _local_cleanup_bdm_volumes method to perform terminate_connection and detach methods.

REFFERED FILES:

 /opt/stack/nova/nova/compute/api.py
 /opt/stack/nova/nova/compute/manager.py

Sean Dague (sdague)
Changed in nova:
assignee: prameela kapuganti (prameela) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.