NetApp eseries vol mapping fails for very close concurrent requests

Bug #1312685 reported by Navneet
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Navneet
Icehouse
Fix Released
Undecided
Unassigned

Bug Description

The copy image to volume operation for two close concurrent requests from the same cinder node can fail sometimes as the vol mapping gets the same lun number.

Below is the QA setup.

Setup:

$ dd if=/dev/zero of=/tmp/onegig bs=1M count=1024
$ glance image-create --name onegig --disk-format raw --container-format bare --progress </tmp/onegig
$ cinder create --image-id 8e26c1ac-c7dd-4454-8312-58857b0e3317 --display-name fromONEGIG_2 2
$ cinder create --image-id 8e26c1ac-c7dd-4454-8312-58857b0e3317 --display-name fromONEGIG_3 2

the above create of fromONEGIG_3 volume above will fail with the following because lun 0 is already in use from the first ongoing operation:

2014-04-21 13:04:33.885 WARNING cinder.brick.initiator.connector [req-0d2e61ac-cf02-4311-96c0-dee3b6e20b97 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] ISCSI volume not yet found at: /dev/disk/by-path/ip-10.63.165.220:3260-iscsi-iqn.1992-01.com.lsi:2365.60080e500023c73400000000515af323-lun-0. Will rescan & retry. Try number: 0
2014-04-21 13:04:38.861 ERROR cinder.volume.driver [req-c468b903-bd6d-4639-a1bc-34687fa5f537 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] Unable to fetch connection information from backend: Response error - {
2014-04-21 13:04:38.862 ERROR cinder.volume.flows.manager.create_volume [req-c468b903-bd6d-4639-a1bc-34687fa5f537 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] Failed to copy image 8e26c1ac-c7dd-4454-8312-58857b0e3317 to volume: ddb08628-3886-40b5-a2d6-3946b636f4cb, error: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: Response error - {
2014-04-21 13:04:45.212 ERROR cinder.volume.flows.manager.create_volume [req-c468b903-bd6d-4639-a1bc-34687fa5f537 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] Volume ddb08628-3886-40b5-a2d6-3946b636f4cb: create failed
2014-04-21 13:04:45.214 ERROR oslo.messaging.rpc.dispatcher [req-c468b903-bd6d-4639-a1bc-34687fa5f537 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] Exception during message handling: Failed to copy image to volume: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: Response error - {
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 363, in create_volume
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher _run_flow()
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 356, in _run_flow
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher flow_engine.run()
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/utils/lock_utils.py", line 53, in wrapper
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher return f(*args, **kwargs)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/engine.py", line 111, in run
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher self._run()
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/engine.py", line 121, in _run
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher self._revert(misc.Failure())
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/engine.py", line 78, in _revert
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher misc.Failure.reraise_if_any(failures.values())
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/utils/misc.py", line 558, in reraise_if_any
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher failures[0].reraise()
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/utils/misc.py", line 565, in reraise
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher six.reraise(*self._exc_info)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/executor.py", line 36, in _execute_task
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher result = task.execute(**arguments)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 594, in execute
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher **volume_spec)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 556, in _create_from_image
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher image_id, image_location, image_service)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 470, in _copy_image_to_volume
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher raise exception.ImageCopyFailure(reason=ex)
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher ImageCopyFailure: Failed to copy image to volume: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: Response error - {
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher "errorMessage" : "The operation cannot complete because the logical unit number (LUN) is already in use. Please enter another LUN.",
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher "localizedMessage" : "The operation cannot complete because the logical unit number (LUN) is already in use. Please enter another LUN.",
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher "retcode" : "62",
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher "codeType" : "symbol"
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher }.
2014-04-21 13:04:45.214 TRACE oslo.messaging.rpc.dispatcher
2014-04-21 13:04:45.623 ERROR oslo.messaging._drivers.common [req-c468b903-bd6d-4639-a1bc-34687fa5f537 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] Returning exception Failed to copy image to volume: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: Response error - {
2014-04-21 13:04:45.624 ERROR oslo.messaging._drivers.common [req-c468b903-bd6d-4639-a1bc-34687fa5f537 0ecd8a92b6814937b584f0dce8eb935a 0058191e793e4e3b8ca6d65eb49cfb3a] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 133, in _dispatch_and_reply\n incoming.message))\n', ' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 176, in _dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', ' File "/usr/lib/python2.6/site-packages/oslo/messaging/rpc/dispatcher.py", line 122, in _do_dispatch\n result = getattr(endpoint, method)(ctxt, **new_args)\n', ' File "/opt/stack/cinder/cinder/volume/manager.py", line 363, in create_volume\n _run_flow()\n', ' File "/opt/stack/cinder/cinder/volume/manager.py", line 356, in _run_flow\n flow_engine.run()\n', ' File "/usr/lib/python2.6/site-packages/taskflow/utils/lock_utils.py", line 53, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/engine.py", line 111, in run\n self._run()\n', ' File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/engine.py", line 121, in _run\n self._revert(misc.Failure())\n', ' File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/engine.py", line 78, in _revert\n misc.Failure.reraise_if_any(failures.values())\n', ' File "/usr/lib/python2.6/site-packages/taskflow/utils/misc.py", line 558, in reraise_if_any\n failures[0].reraise()\n', ' File "/usr/lib/python2.6/site-packages/taskflow/utils/misc.py", line 565, in reraise\n six.reraise(*self._exc_info)\n', ' File "/usr/lib/python2.6/site-packages/taskflow/engines/action_engine/executor.py", line 36, in _execute_task\n result = task.execute(**arguments)\n', ' File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 594, in execute\n **volume_spec)\n', ' File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 556, in _create_from_image\n image_id, image_location, image_service)\n', ' File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 470, in _copy_image_to_volume\n raise exception.ImageCopyFailure(reason=ex)\n', 'ImageCopyFailure: Failed to copy image to volume: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: Response error - {\n "errorMessage" : "The operation cannot complete because the logical unit number (LUN) is already in use. Please enter another LUN.",\n "localizedMessage" : "The operation cannot complete because the logical unit number (LUN) is already in use. Please enter another LUN.",\n "retcode" : "62",\n "codeType" : "symbol"\n}.\n']

Navneet (singn)
Changed in cinder:
assignee: nobody → Navneet (singn)
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/90351

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/90351
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=209c09a76f5e90aaa2899804686e6a513703d887
Submitter: Jenkins
Branch: master

commit 209c09a76f5e90aaa2899804686e6a513703d887
Author: Navneet Singh <email address hidden>
Date: Fri Feb 28 15:49:49 2014 +0530

    NetApp fix eseries concurrent vol map failure

    This fixes the issue while mapping volume to host
    when the requests are very close in time proximity.
    The map operation has been synchronized.

    Change-Id: I2e120d84d3b96faa4ab8d4324e70a089bcddeaa5
    Closes-bug: #1312685

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → juno-1
status: Fix Committed → Fix Released
Navneet (singn)
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/127984

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/icehouse)

Reviewed: https://review.openstack.org/127984
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=0b8455aa99248add5b3f26d3fc7050a657696c2d
Submitter: Jenkins
Branch: stable/icehouse

commit 0b8455aa99248add5b3f26d3fc7050a657696c2d
Author: Navneet Singh <email address hidden>
Date: Fri Feb 28 15:49:49 2014 +0530

    NetApp fix eseries concurrent vol map failure

    This fixes the issue while mapping volume to host
    when the requests are very close in time proximity.
    The map operation has been synchronized.

    Change-Id: I2e120d84d3b96faa4ab8d4324e70a089bcddeaa5
    Closes-bug: #1312685
    (cherry picked from commit 209c09a76f5e90aaa2899804686e6a513703d887)

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-1 → 2014.2
Eric Harney (eharney)
tags: removed: icehouse-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.