multipath iscsi sessions and device mapper entries are left behind when disconnecting volumes

Bug #1502534 reported by Patrick East
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Unassigned

Bug Description

On the Kilo stable branch sometimes after running the full tempest test suite there are entries left behind from cinder such as:

stack@devstack:~/tempest$ sudo iscsiadm -m session
tcp: [135] 10.0.5.10:3260,1 iqn.2010-06.com.purestorage:flasharray.3adbe40b49bac873
tcp: [136] 10.0.1.11:3260,1 iqn.2010-06.com.purestorage:flasharray.3adbe40b49bac873
tcp: [137] 10.0.5.11:3260,1 iqn.2010-06.com.purestorage:flasharray.3adbe40b49bac873
stack@devstack:~/tempest$ sudo multipath -l
3624a93709a738ed78583fd1200131aa1 dm-6 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=enabled
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131aa0 dm-7 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131a9f dm-9 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131a9e dm-3 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131a9d dm-2 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131a9c dm-1 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131a9b dm-0 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131aa4 dm-8 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131aa3 dm-5 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running
3624a93709a738ed78583fd1200131aa2 dm-4 ,
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  |- #:#:#:# - #:# active undef running
  `- #:#:#:# - #:# active undef running

This is showing that there are quite a few 'orphaned' multipath entries and some iscsi sessions that did not properly get cleaned up. These appear to be intermittent issues, but can lead to errors such as:

2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 468, in create_volume
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher _run_flow()
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 456, in _run_flow
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher flow_engine.run()
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/engine.py", line 96, in run
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher for _state in self.run_iter():
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/engine.py", line 153, in run_iter
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher failure.Failure.reraise_if_any(failures.values())
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/taskflow/types/failure.py", line 244, in reraise_if_any
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher failures[0].reraise()
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/taskflow/types/failure.py", line 251, in reraise
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher six.reraise(*self._exc_info)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/taskflow/engines/action_engine/executor.py", line 67, in _execute_task
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher result = task.execute(**arguments)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 653, in execute
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher **volume_spec)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 605, in _create_from_image
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher image_id, image_location, image_service)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 516, in _copy_image_to_volume
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher raise exception.ImageCopyFailure(reason=ex)
2015-10-03 05:37:15.939 TRACE oslo_messaging.rpc.dispatcher ImageCopyFailure: Failed to copy image to volume: Volume device not found at [u'/dev/disk/by-path/ip-10.0.1.10:3260-iscsi-iqn.2010-06.com.purestorage:flasharray.3adbe40b49bac873-lun-9'].

Because the sessions get left in an error state if not logged out and disconnected by the target array (once terminate_connection happens).

These errors are not observed in the liberty release so the latest changes and bug fixes for os-brick seem to have corrected this behavior.

Changed in cinder:
assignee: nobody → Patrick East (patrick-east)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/kilo)

Change abandoned by Patrick East (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/230781
Reason: I've split this up into each single back-port, please see https://review.openstack.org/#/q/status:open+project:openstack/cinder+branch:stable/kilo+topic:multipath-iscsi-cleanup,n,z for the full list.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote : Owner Expired

Unassigning due to no activity.

Changed in cinder:
assignee: Patrick East (patrick-east) → nobody
Gorka Eguileor (gorka)
Changed in cinder:
assignee: nobody → Gorka Eguileor (gorka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 1.14.0

This issue was fixed in the openstack/os-brick 1.14.0 release.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote : Bug Assignee Expired

Unassigning due to no activity for > 6 months.

Changed in cinder:
assignee: Gorka Eguileor (gorka) → nobody
Revision history for this message
Gorka Eguileor (gorka) wrote :

As mentioned on comment #3 this issue was fixed in the openstack/os-brick 1.14.0 release.

Changed in cinder:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.