Bug #1398078 “test_delete_server_while_in_attached_volume failed...” : Bugs : Cinder

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#1

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiVm9sdW1lIE5vbmUgZmFpbGVkIHRvIHJlYWNoIGluLXVzZSBzdGF0dXMgd2l0aGluIHRoZSByZXF1aXJlZCB0aW1lXCIgQU5EIHRhZ3M6XCJjb25zb2xlXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MTc0NTE3OTc1MTIsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

13 hits in 7 days, check and gate, all failures.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#2

Download full text (4.1 KiB)

I see this in the cinder volume logs:

http://logs.openstack.org/17/117417/11/gate/gate-tempest-dsvm-neutron-pg-full/c3fe760/logs/screen-c-vol.txt.gz#_2014-11-26_14_16_03_152

2014-11-26 14:16:03.152 1390 WARNING cinder.brick.iscsi.iscsi [req-1c2046a2-0e5a-4a9c-9d11-b1f8f4a62e3d 233bb4a2bf5a463eb92255fda6d45011 92d6d94d6d384ea984ec2f838413a396 - - -] Failed to create iscsi target for volume id:volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0: Unexpected error while running command.
Command: None
Exit code: -
Stdout: u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --update iqn.2010-10.org.openstack:volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0\nExit code: 22\nStdout: u'Command:\\n\\ttgtadm -C 0 --lld iscsi --op new --mode target --tid 1 -T iqn.2010-10.org.openstack:volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0\\nexited with code: 22.\\n'\nStderr: u'tgtadm: this target already exists\\n'"
Stderr: None
2014-11-26 14:16:03.153 1390 ERROR oslo.messaging.rpc.dispatcher [req-1c2046a2-0e5a-4a9c-9d11-b1f8f4a62e3d 233bb4a2bf5a463eb92255fda6d45011 92d6d94d6d384ea984ec2f838413a396 - - -] Exception during message handling: Failed to create iscsi target for volume volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher return f(*args, **kwargs)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/cinder/cinder/volume/manager.py", line 898, in initialize_connection
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher volume)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher return f(*args, **kwargs)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/cinder/cinder/volume/drivers/lvm.py", line 549, in create_export
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher return self._cr...

I see this in the cinder volume logs:

http://logs.openstack.org/17/117417/11/gate/gate-tempest-dsvm-neutron-pg-full/c3fe760/logs/screen-c-vol.txt.gz#_2014-11-26_14_16_03_152

2014-11-26 14:16:03.152 1390 WARNING cinder.brick.iscsi.iscsi [req-1c2046a2-0e5a-4a9c-9d11-b1f8f4a62e3d 233bb4a2bf5a463eb92255fda6d45011 92d6d94d6d384ea984ec2f838413a396 - - -] Failed to create iscsi target for volume id:volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0: Unexpected error while running command.
Command: None
Exit code: -
Stdout: u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin --update iqn.2010-10.org.openstack:volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0\nExit code: 22\nStdout: u'Command:\\n\\ttgtadm -C 0 --lld iscsi --op new --mode target --tid 1 -T iqn.2010-10.org.openstack:volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0\\nexited with code: 22.\\n'\nStderr: u'tgtadm: this target already exists\\n'"
Stderr: None
2014-11-26 14:16:03.153 1390 ERROR oslo.messaging.rpc.dispatcher [req-1c2046a2-0e5a-4a9c-9d11-b1f8f4a62e3d 233bb4a2bf5a463eb92255fda6d45011 92d6d94d6d384ea984ec2f838413a396 - - -] Exception during message handling: Failed to create iscsi target for volume volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     return f(*args, **kwargs)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/cinder/cinder/volume/manager.py", line 898, in initialize_connection
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     volume)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 105, in wrapper
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     return f(*args, **kwargs)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/cinder/cinder/volume/drivers/lvm.py", line 549, in create_export
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     return self._create_export(context, volume)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/cinder/cinder/volume/drivers/lvm.py", line 561, in _create_export
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     self.configuration)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/cinder/cinder/volume/iscsi.py", line 62, in create_export
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     conf.iscsi_write_cache)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/cinder/cinder/brick/iscsi/iscsi.py", line 252, in create_iscsi_target
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher     raise exception.ISCSITargetCreateFailed(volume_id=vol_id)
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher ISCSITargetCreateFailed: Failed to create iscsi target for volume volume-c1dfc318-a139-45a3-b235-f35d1b1fb9d0.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.rpc.dispatcher

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#3

So the volume uuid is c1dfc318-a139-45a3-b235-f35d1b1fb9d0, and it already exists when cinder tried to create it.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#4

This seems to drop off of logstash after 11/27 but I don't see anything in cinder commits since then that look related. Maybe it was fixed in Tempest with this?

https://github.com/openstack/tempest/commit/f6bf3d5a33398bf7ea5c94173ce66530823a5009

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#5

Here is the problem in Tempest:

http://git.openstack.org/cgit/openstack/tempest/tree/tempest/services/compute/json/volumes_extensions_client.py?id=2#n96

That error message should be using the volume id (uuid) rather than the volume name, which is a different response between cinder v1 and v2.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#6

Related Tempest timeout exception message fix: https://review.openstack.org/#/c/138126/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#7

jgriffith pointed this out:

http://logs.openstack.org/17/117417/11/gate/gate-tempest-dsvm-neutron-pg-full/c3fe760/logs/screen-c-vol.txt.gz#_2014-11-26_14_16_02_494

Sounds like that shouldn't happen, bug since it's logged at debug level logstash won't pick it up so we don't know the correlation to gate failures. He's going to push a change to make that WARNING level so we can see if it's related.

The weird thing is that when looking at the command failure here:

http://logs.openstack.org/17/117417/11/gate/gate-tempest-dsvm-neutron-pg-full/c3fe760/logs/screen-c-vol.txt.gz#_2014-11-26_14_16_03_152

That's doing an op new but the code at:

File "/opt/stack/new/cinder/cinder/brick/iscsi/iscsi.py", line 252, in create_iscsi_target

is doing an op show. It seems like there is some processutils/oslo.concurrency weirdness in here.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#8

There is another volume being successfully created at the same time as the one that fails, I'm wondering if there is shared global variables in this code or something that is contributing to the weird things we're seeing in the logs, where it looks like maybe something is getting mixed up in the flow.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#9

This is a better logstash query:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRmFpbGVkIHRvIGNyZWF0ZSBpc2NzaSB0YXJnZXQgZm9yIHZvbHVtZVwiIEFORCBtZXNzYWdlOlwiVW5leHBlY3RlZCBlcnJvciB3aGlsZSBydW5uaW5nIGNvbW1hbmRcIiBBTkQgbWVzc2FnZTpcInRoaXMgdGFyZ2V0IGFscmVhZHkgZXhpc3RzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tYy12b2wudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MTc0NTYyNDkzOTR9

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-01:

#10

e-r query: https://review.openstack.org/#/c/138135/

Changed in cinder:
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-01: Fix proposed to cinder (master)

#11

Fix proposed to branch: master
Review: https://review.openstack.org/138166

Changed in cinder:
assignee:	nobody → John Griffith (john-griffith)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-01: Change abandoned on cinder (master)

#12

Change abandoned by John Griffith (<email address hidden>) on branch: master
Review: https://review.openstack.org/138166
Reason: splitting up into two patches

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-01: Fix proposed to cinder (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/138173

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-02: Fix merged to cinder (master)

#14

Reviewed: https://review.openstack.org/138173
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d7bd65e8e9ff800d370effc1a8d24ea900a5d435
Submitter: Jenkins
Branch: master

commit d7bd65e8e9ff800d370effc1a8d24ea900a5d435
Author: John Griffith <email address hidden>
Date: Mon Dec 1 13:29:02 2014 -0700

Deal with tgt already exists errors

    This patch intends to provide a clear marker for
    ER in the case that create export fails due to the
    target entry already existing.

Also this patch will enable us to go ahead and just use
the existing target rather than bomb out and fail everything.

    Root cause of why we're getting a second create is still
    unknown and needs addressed, but this might help in getting
    more info as well as keeping things stable until we address
    the root issue.

Change-Id: If96f4fab7d9709dbb0726b1dca6237d7bc66ecbe
Partial-Bug: #1398078

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-15: Related fix proposed to cinder (master)

#15

Related fix proposed to branch: master
Review: https://review.openstack.org/141886

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-16: Related fix merged to cinder (master)

#16

Reviewed: https://review.openstack.org/141886
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=eb20e9e5009541ea0037588a43443c74e8968c9c
Submitter: Jenkins
Branch: master

commit eb20e9e5009541ea0037588a43443c74e8968c9c
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 15 11:23:42 2014 -0800

Add unit test for commit 22abe9081

Commit 22abe9081 fixed a bug in commit d7bd65e8e which we could have
found the first time with a simple unit test.

This change adds the unit test to cover that new error handling block.

Related-Bug: #1398078

Change-Id: I86d2fd477f3ae5590c5f079a4c806d1f50eb96d0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-07: Fix proposed to cinder (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/145608

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-07:

#18

Fix proposed to branch: master
Review: https://review.openstack.org/145630

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-08: Fix merged to cinder (master)

#19

Reviewed: https://review.openstack.org/145608
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=941e5f761be165accaabbdeec1608189b04ab67e
Submitter: Jenkins
Branch: master

commit 941e5f761be165accaabbdeec1608189b04ab67e
Author: John Griffith <email address hidden>
Date: Wed Jan 7 15:04:07 2015 -0700

Deal with tgt already exists errors

    So there's a major problem in the iscsi code (and has been
    for quite some time). The tgt show command sometimes seems
    to be corrupt and the issued command is actually a tgt create.
    The result is that we already have a tgt and the call raises
    which then causes the operation to fail.

    An example of the issue:
      Stdout: Unexpected error while running command.Command:
      sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin
      --update iqn.2010-10.org.openstack:volume-f055d3c5-db7a-
      484e-9d0d-b98495439413
      Exit code: 22
      Stdout:
      Command:tgtadm -C 0 --lld iscsi --op new --mode target
      --tid 1 -T iqn.2010-10.org.openstack:volume-f055d3c5-db7a-
      484e-9d0d-b98495439413
      exited with code: 22.
      Stderr: u'tgtadm: this target already exists

    What's disturbing however is that in that section of code
    we're sending a --op show!! Could be something we're
    doing with our member executor? Or maybe something to
    do with the new oslo concurrency code?

    Regardless, his patch intends to provide a clear marker for
    ER in the case that create export fails due to the
    target entry already existing.

    Also this patch will enable us to go ahead and just use
    the existing target rather than bomb out and fail everything.
    Root cause of why we're getting a second create is still
    unknown and needs addressed, but this might help in getting
    more info as well as keeping things stable until we address
    the root issue.

Change-Id: I0faf4d49b2d3e631b08ec1dff4361ff2376e3308
Partial-Bug: #1398078

Reviewed:  https://review.openstack.org/145608
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=941e5f761be165accaabbdeec1608189b04ab67e
Submitter: Jenkins
Branch:    master

commit 941e5f761be165accaabbdeec1608189b04ab67e
Author: John Griffith <john.griffith@solidfire.com>
Date:   Wed Jan 7 15:04:07 2015 -0700

Deal with tgt already exists errors
    
    So there's a major problem in the iscsi code (and has been
    for quite some time).  The tgt show command sometimes seems
    to be corrupt and the issued command is actually a tgt create.
    The result is that we already have a tgt and the call raises
    which then causes the operation to fail.
    
    An example of the issue:
      Stdout: Unexpected error while running command.Command:
      sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgt-admin
      --update iqn.2010-10.org.openstack:volume-f055d3c5-db7a-
      484e-9d0d-b98495439413
      Exit code: 22
      Stdout:
      Command:tgtadm -C 0 --lld iscsi --op new --mode target
      --tid 1 -T iqn.2010-10.org.openstack:volume-f055d3c5-db7a-
      484e-9d0d-b98495439413
      exited with code: 22.
      Stderr: u'tgtadm: this target already exists
    
    What's disturbing however is that in that section of code
    we're sending a --op show!!  Could be something we're
    doing with our member executor?  Or maybe something to
    do with the new oslo concurrency code?
    
    Regardless, his patch intends to provide a clear marker for
    ER in the case that create export fails due to the
    target entry already existing.
    
    Also this patch will enable us to go ahead and just use
    the existing target rather than bomb out and fail everything.
    Root cause of why we're getting a second create is still
    unknown and needs addressed, but this might help in getting
    more info as well as keeping things stable until we address
    the root issue.
    
    Change-Id: I0faf4d49b2d3e631b08ec1dff4361ff2376e3308
    Partial-Bug: #1398078

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-08:

#20

Reviewed: https://review.openstack.org/145630
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=80f47322982355ff6ec6eac87e09926432bf1f3e
Submitter: Jenkins
Branch: master

commit 80f47322982355ff6ec6eac87e09926432bf1f3e
Author: John Griffith <email address hidden>
Date: Wed Jan 7 16:25:48 2015 -0700

Use cinder.utils.execute directly

    When we were in the process of working on brick libs
    we did quite a bit of funky stuff with setting a member
    variable for each class to point to an executor that was
    passed in during init.

    There's no longer any reason to do this with the target drivers,
    so we should simplify our lives a bit and go back to using the
    good old cinder.utils wrapper.

    It's also believed that the use of the member variable is
    susceptible to some concurrency issue that was causing the
    wrong cmd string to be executed. We'll mark this as closing
    that bug and reopen if we still see the signature in Kibana.

Change-Id: I7a5648d496d15eec5d7ac3643411198226f1ffdf
Closes-Bug: #1398078

Changed in cinder:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-09: Related fix proposed to cinder (master)

#21

Related fix proposed to branch: master
Review: https://review.openstack.org/145985

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-10: Related fix merged to cinder (master)

#22

Reviewed: https://review.openstack.org/145985
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=1e49234db32d2b4ba000e676ab36602a313b6784
Submitter: Jenkins
Branch: master

commit 1e49234db32d2b4ba000e676ab36602a313b6784
Author: John Griffith <email address hidden>
Date: Thu Jan 8 17:42:05 2015 -0700

Add debug messaging for tgt already exists

    We have a race condition where the call to create an
    iscsi target seems to be issued twice. This patch
    adds some debug logging to try and get some more info
    around where things are going wrong.

First, we add a log message to the intialize_connection call
in the cinder.volume.api and include the connector info.

Second, we add a target show output prior to the update/create
command in the target driver.

    In addition, we also had the update and show commands in the same
    try/except block which led us to believe the show was failing,
    but that's NOT the case at all. So this patch also moves that
    existing show (debug purposes as well) out of the main try block.

    Note that there are some old bugs logged against TgtAdm for issues
    with it "loosing count" of target id's it's issued, and in turn
    reissuing the same ID multiple times. This patch is intended to
    give us enough info to determine if that's what's going on or not.

Other possibility is that the initialize call is being issued more
than once.

Change-Id: If17acff6c0b697261ae0e7a40fc26962cbee6c7b
Related-Bug: #1398078

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-05: Fix merged to cinder (master)

#23

Reviewed: https://review.openstack.org/147566
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=4dc367f7d736ae506db9532270e92977c8b3954d
Submitter: Jenkins
Branch: master

commit 4dc367f7d736ae506db9532270e92977c8b3954d
Author: John Griffith <email address hidden>
Date: Thu Jan 15 08:56:28 2015 -0700

Add retry for tgtadm update when tgt exists

    For target creation using tgtadm driver, we create a persistence
    file for the target, then send a tgt-admin --update 'name' where
    name is the specific persistence file we want to read in and update
    from.

    It turns out that we can hit race conditions where the persistence
    file is written, and an update is requested but target has already
    done work to make it believe that the target has already been created.

One thought was to just use "update ALL" but this still seems to have
issues, and changes the error to an account exists failure.

    This patch takes the brute force approach and adds the cinder.utils
    retry decorator to the tgt-admin --update command. To do this
    we just break out the tgt-admin --update call into it's own method
    and add the decorator to it.

Closes-Bug: #1398078

Change-Id: Ic72fb5f4fca70a7e7d1c306efe19674fd7e42cd6

Thierry Carrez (ttx) on 2015-02-05

Changed in cinder:
milestone:	none → kilo-2
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in cinder:
milestone:	kilo-2 → 2015.1.0

Revision history for this message

Darren Carpenter (wdarrenc) wrote on 2015-08-14:

#24

Any chance for a backport of this to Juno?

Revision history for this message

John Griffith (john-griffith) wrote on 2015-08-14:

#25

Backport was proposed for Juno but rejected because it brought in an external lib making things difficult for package maintainers. Proposals here:
https://review.openstack.org/#/c/168424/
https://review.openstack.org/#/c/168419/

IIRC Mike said he was going to work this up, I'll take a look and submit something. Not sure it will get approval at this point or not, but I'll propose it anyway. Should be easy, Vishy already had a version I can just pull that over and use it.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-14: Fix proposed to cinder (stable/juno)

#26

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/213304

Revision history for this message

John Griffith (john-griffith) wrote on 2015-08-14:

#27

Submitted here:
https://review.openstack.org/#/c/213304/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-13: Change abandoned on cinder (stable/juno)

#28

Change abandoned by Alan Pevec (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/213304

Revision history for this message

zhangsong (zhangsong) wrote on 2016-05-31:

#29

The current code will retry if it encounters the error, but this is a work-around, not the root cause.

Current code of tgt.py use command 'tgt-admin --update name' to create the iscsi target of tgt:
https://github.com/openstack/cinder/blob/master/cinder/volume/targets/tgt.py#L119

The workflow of tgt-admin is:
1. get the available tid(usually it is the current max tid plus 1),I named it next_tid ;
2. use the next_tid to create a target:
tgtadm -C 0 --lld iscsi --op new --mode target --tid next_tid -T target_name

Here is the source code of tgt-admin:
https://github.com/fujita/tgt/blob/master/scripts/tgt-admin#L278

But there is a race condition between step 1 and step 2. If multiple tgt-admin commands executed simultaneous in the same node, some of them have the chance to get the same next_tid, and then use the same tid to create target, this will cause the error of 'tgtadm: this target already exists'.

I think the best way to solve this problem is to add a lock when create a tgt target.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-31: Fix proposed to cinder (master)

#30

Fix proposed to branch: master
Review: https://review.openstack.org/323452

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-10: Change abandoned on cinder (master)

#31

Change abandoned by zhangsong (<email address hidden>) on branch: master
Review: https://review.openstack.org/323452
Reason: Retry not cause a problem, just keep
it. Thank you.

Cinder

test_delete_server_while_in_attached_volume failed with "Volume None failed to reach in-use status within the required time"

Bug Description

Other bug subscribers

Remote bug watches