test_delete_server_while_in_attached_volume failed with "Volume None failed to reach in-use status within the required time"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Fix Released
|
High
|
John Griffith |
Bug Description
There is obviously a tempest bug here too in that it's not showing the volume uuid in the error message, unless the bug IS in tempest and it's not tracking the UUID correctly.
Matt Riedemann (mriedem) wrote : | #1 |
Matt Riedemann (mriedem) wrote : | #2 |
I see this in the cinder volume logs:
2014-11-26 14:16:03.152 1390 WARNING cinder.
Command: None
Exit code: -
Stdout: u"Unexpected error while running command.\nCommand: sudo cinder-rootwrap /etc/cinder/
Stderr: None
2014-11-26 14:16:03.153 1390 ERROR oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
2014-11-26 14:16:03.153 1390 TRACE oslo.messaging.
Matt Riedemann (mriedem) wrote : | #3 |
So the volume uuid is c1dfc318-
Matt Riedemann (mriedem) wrote : | #4 |
This seems to drop off of logstash after 11/27 but I don't see anything in cinder commits since then that look related. Maybe it was fixed in Tempest with this?
https:/
Matt Riedemann (mriedem) wrote : | #5 |
Here is the problem in Tempest:
That error message should be using the volume id (uuid) rather than the volume name, which is a different response between cinder v1 and v2.
Matt Riedemann (mriedem) wrote : | #6 |
Related Tempest timeout exception message fix: https:/
Matt Riedemann (mriedem) wrote : | #7 |
jgriffith pointed this out:
Sounds like that shouldn't happen, bug since it's logged at debug level logstash won't pick it up so we don't know the correlation to gate failures. He's going to push a change to make that WARNING level so we can see if it's related.
The weird thing is that when looking at the command failure here:
That's doing an op new but the code at:
File "/opt/stack/
is doing an op show. It seems like there is some processutils/
Matt Riedemann (mriedem) wrote : | #8 |
There is another volume being successfully created at the same time as the one that fails, I'm wondering if there is shared global variables in this code or something that is contributing to the weird things we're seeing in the logs, where it looks like maybe something is getting mixed up in the flow.
Matt Riedemann (mriedem) wrote : | #9 |
Matt Riedemann (mriedem) wrote : | #10 |
e-r query: https:/
Changed in cinder: | |
importance: | Undecided → High |
status: | New → Confirmed |
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master) | #11 |
Fix proposed to branch: master
Review: https:/
Changed in cinder: | |
assignee: | nobody → John Griffith (john-griffith) |
status: | Confirmed → In Progress |
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master) | #12 |
Change abandoned by John Griffith (<email address hidden>) on branch: master
Review: https:/
Reason: splitting up into two patches
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master) | #13 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master) | #14 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit d7bd65e8e9ff800
Author: John Griffith <email address hidden>
Date: Mon Dec 1 13:29:02 2014 -0700
Deal with tgt already exists errors
This patch intends to provide a clear marker for
ER in the case that create export fails due to the
target entry already existing.
Also this patch will enable us to go ahead and just use
the existing target rather than bomb out and fail everything.
Root cause of why we're getting a second create is still
unknown and needs addressed, but this might help in getting
more info as well as keeping things stable until we address
the root issue.
Change-Id: If96f4fab7d9709
Partial-Bug: #1398078
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master) | #15 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master) | #16 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit eb20e9e5009541e
Author: Matt Riedemann <email address hidden>
Date: Mon Dec 15 11:23:42 2014 -0800
Add unit test for commit 22abe9081
Commit 22abe9081 fixed a bug in commit d7bd65e8e which we could have
found the first time with a simple unit test.
This change adds the unit test to cover that new error handling block.
Related-Bug: #1398078
Change-Id: I86d2fd477f3ae5
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master) | #17 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : | #18 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master) | #19 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 941e5f761be165a
Author: John Griffith <email address hidden>
Date: Wed Jan 7 15:04:07 2015 -0700
Deal with tgt already exists errors
So there's a major problem in the iscsi code (and has been
for quite some time). The tgt show command sometimes seems
to be corrupt and the issued command is actually a tgt create.
The result is that we already have a tgt and the call raises
which then causes the operation to fail.
An example of the issue:
Stdout: Unexpected error while running command.Command:
sudo cinder-rootwrap /etc/cinder/
--update iqn.2010-
484e-
Exit code: 22
Stdout:
Command:
--tid 1 -T iqn.2010-
484e-
exited with code: 22.
Stderr: u'tgtadm: this target already exists
What's disturbing however is that in that section of code
we're sending a --op show!! Could be something we're
doing with our member executor? Or maybe something to
do with the new oslo concurrency code?
Regardless, his patch intends to provide a clear marker for
ER in the case that create export fails due to the
target entry already existing.
Also this patch will enable us to go ahead and just use
the existing target rather than bomb out and fail everything.
Root cause of why we're getting a second create is still
unknown and needs addressed, but this might help in getting
more info as well as keeping things stable until we address
the root issue.
Change-Id: I0faf4d49b2d3e6
Partial-Bug: #1398078
OpenStack Infra (hudson-openstack) wrote : | #20 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 80f47322982355f
Author: John Griffith <email address hidden>
Date: Wed Jan 7 16:25:48 2015 -0700
Use cinder.
When we were in the process of working on brick libs
we did quite a bit of funky stuff with setting a member
variable for each class to point to an executor that was
passed in during init.
There's no longer any reason to do this with the target drivers,
so we should simplify our lives a bit and go back to using the
good old cinder.utils wrapper.
It's also believed that the use of the member variable is
susceptible to some concurrency issue that was causing the
wrong cmd string to be executed. We'll mark this as closing
that bug and reopen if we still see the signature in Kibana.
Change-Id: I7a5648d496d15e
Closes-Bug: #1398078
Changed in cinder: | |
status: | In Progress → Fix Committed |
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master) | #21 |
Related fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master) | #22 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 1e49234db32d2b4
Author: John Griffith <email address hidden>
Date: Thu Jan 8 17:42:05 2015 -0700
Add debug messaging for tgt already exists
We have a race condition where the call to create an
iscsi target seems to be issued twice. This patch
adds some debug logging to try and get some more info
around where things are going wrong.
First, we add a log message to the intialize_
in the cinder.volume.api and include the connector info.
Second, we add a target show output prior to the update/create
command in the target driver.
In addition, we also had the update and show commands in the same
try/except block which led us to believe the show was failing,
but that's NOT the case at all. So this patch also moves that
existing show (debug purposes as well) out of the main try block.
Note that there are some old bugs logged against TgtAdm for issues
with it "loosing count" of target id's it's issued, and in turn
reissuing the same ID multiple times. This patch is intended to
give us enough info to determine if that's what's going on or not.
Other possibility is that the initialize call is being issued more
than once.
Change-Id: If17acff6c0b697
Related-Bug: #1398078
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master) | #23 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 4dc367f7d736ae5
Author: John Griffith <email address hidden>
Date: Thu Jan 15 08:56:28 2015 -0700
Add retry for tgtadm update when tgt exists
For target creation using tgtadm driver, we create a persistence
file for the target, then send a tgt-admin --update 'name' where
name is the specific persistence file we want to read in and update
from.
It turns out that we can hit race conditions where the persistence
file is written, and an update is requested but target has already
done work to make it believe that the target has already been created.
One thought was to just use "update ALL" but this still seems to have
issues, and changes the error to an account exists failure.
This patch takes the brute force approach and adds the cinder.utils
retry decorator to the tgt-admin --update command. To do this
we just break out the tgt-admin --update call into it's own method
and add the decorator to it.
Closes-Bug: #1398078
Change-Id: Ic72fb5f4fca70a
Changed in cinder: | |
milestone: | none → kilo-2 |
status: | Fix Committed → Fix Released |
Changed in cinder: | |
milestone: | kilo-2 → 2015.1.0 |
Darren Carpenter (wdarrenc) wrote : | #24 |
Any chance for a backport of this to Juno?
John Griffith (john-griffith) wrote : | #25 |
Backport was proposed for Juno but rejected because it brought in an external lib making things difficult for package maintainers. Proposals here:
https:/
https:/
IIRC Mike said he was going to work this up, I'll take a look and submit something. Not sure it will get approval at this point or not, but I'll propose it anyway. Should be easy, Vishy already had a version I can just pull that over and use it.
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/juno) | #26 |
Fix proposed to branch: stable/juno
Review: https:/
John Griffith (john-griffith) wrote : | #27 |
Submitted here:
https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/juno) | #28 |
Change abandoned by Alan Pevec (<email address hidden>) on branch: stable/juno
Review: https:/
zhangsong (zhangsong) wrote : | #29 |
The current code will retry if it encounters the error, but this is a work-around, not the root cause.
Current code of tgt.py use command 'tgt-admin --update name' to create the iscsi target of tgt:
https:/
The workflow of tgt-admin is:
1. get the available tid(usually it is the current max tid plus 1),I named it next_tid ;
2. use the next_tid to create a target:
tgtadm -C 0 --lld iscsi --op new --mode target --tid next_tid -T target_name
Here is the source code of tgt-admin:
https:/
But there is a race condition between step 1 and step 2. If multiple tgt-admin commands executed simultaneous in the same node, some of them have the chance to get the same next_tid, and then use the same tid to create target, this will cause the error of 'tgtadm: this target already exists'.
I think the best way to solve this problem is to add a lock when create a tgt target.
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master) | #30 |
Fix proposed to branch: master
Review: https:/
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master) | #31 |
Change abandoned by zhangsong (<email address hidden>) on branch: master
Review: https:/
Reason: Retry not cause a problem, just keep
it. Thank you.
http:// logstash. openstack. org/#eyJzZWFyY2 giOiJtZXNzYWdlO lwiVm9sdW1lIE5v bmUgZmFpbGVkIHR vIHJlYWNoIGluLX VzZSBzdGF0dXMgd 2l0aGluIHRoZSBy ZXF1aXJlZCB0aW1 lXCIgQU5EIHRhZ3 M6XCJjb25zb2xlX CIiLCJmaWVsZHMi OltdLCJvZmZzZXQ iOjAsInRpbWVmcm FtZSI6IjYwNDgwM CIsImdyYXBobW9k ZSI6ImNvdW50Iiw idGltZSI6eyJ1c2 VyX2ludGVydmFsI jowfSwic3RhbXAi OjE0MTc0NTE3OTc 1MTIsIm1vZGUiOi IiLCJhbmFseXplX 2ZpZWxkIjoiIn0=
13 hits in 7 days, check and gate, all failures.