Cinder

boto tests randomly fail in resource_cleanup with "Failed to delete volume"

Bug #1450110 reported by Matt Riedemann on 2015-04-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	High	Unassigned	Cinder 7.0.0 "liberty"

Bug Description

http://logs.openstack.org/80/174480/15/check/check-tempest-dsvm-full/27b3bb4/console.html#_2015-04-29_15_26_48_298

2015-04-29 15:26:48.298 | tearDownClass (tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest)
2015-04-29 15:26:48.298 | -----------------------------------------------------------------------------
2015-04-29 15:26:48.298 |
2015-04-29 15:26:48.299 | Captured traceback:
2015-04-29 15:26:48.299 | ~~~~~~~~~~~~~~~~~~~
2015-04-29 15:26:48.299 | Traceback (most recent call last):
2015-04-29 15:26:48.299 | File "tempest/test.py", line 288, in tearDownClass
2015-04-29 15:26:48.300 | teardown()
2015-04-29 15:26:48.300 | File "tempest/thirdparty/boto/test.py", line 283, in resource_cleanup
2015-04-29 15:26:48.300 | raise exceptions.TearDownException(num=fail_count)
2015-04-29 15:26:48.301 | tempest.exceptions.TearDownException: 1 cleanUp operation failed

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRmlsZSBcXFwidGVtcGVzdC90aGlyZHBhcnR5L2JvdG8vdGVzdC5weVxcXCJcIiBBTkQgbWVzc2FnZTpcImluIHJlc291cmNlX2NsZWFudXBcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQzMDMyMjY0NzIwNn0=

The tempest log has a more specific error:

http://logs.openstack.org/80/174480/15/check/check-tempest-dsvm-full/27b3bb4/logs/tempest.txt.gz?level=TRACE#_2015-04-29_15_22_12_234

2015-04-29 15:22:12.234 5359 ERROR tempest.thirdparty.boto.test [-] Cleanup failed APIClientEC2.deregister_image(aki-00000010)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test Traceback (most recent call last):
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/test.py", line 272, in resource_cleanup
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test function(*pos_args, **kw_args)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "tempest/services/botoclients.py", line 68, in func
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test return getattr(conn, name)(*args, **kwargs)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/boto/ec2/connection.py", line 394, in deregister_image
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test params, verb='POST')
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/boto/connection.py", line 1227, in get_status
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test raise self.ResponseError(response.status, response.reason, body)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test EC2ResponseError: EC2ResponseError: 404 Not Found
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test <?xml version="1.0"?>
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test <Response><Errors><Error><Code>ImageNotFound</Code><Message>Image 19f1e137-3765-48ef-a57e-ab166d26d746 could not be found.</Message></Error></Errors><RequestID>req-5cc6958d-3ae1-40f1-9b8e-3afd08c7d494</RequestID></Response>
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test
2015-04-29 15:22:12.725 5359 ERROR tempest.test [-] teardown of resources failed: 1 cleanUp operation failed
2015-04-29 15:22:12.725 5359 TRACE tempest.test Traceback (most recent call last):
2015-04-29 15:22:12.725 5359 TRACE tempest.test File "tempest/test.py", line 288, in tearDownClass
2015-04-29 15:22:12.725 5359 TRACE tempest.test teardown()
2015-04-29 15:22:12.725 5359 TRACE tempest.test File "tempest/thirdparty/boto/test.py", line 283, in resource_cleanup
2015-04-29 15:22:12.725 5359 TRACE tempest.test raise exceptions.TearDownException(num=fail_count)
2015-04-29 15:22:12.725 5359 TRACE tempest.test TearDownException: 1 cleanUp operation failed
2015-04-29 15:22:12.725 5359 TRACE tempest.test

Which doesn't show up in logstash as much:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQ2xlYW51cCBmYWlsZWQgQVBJQ2xpZW50RUMyLmRlcmVnaXN0ZXJfaW1hZ2VcIiBBTkQgdGFnczpcInRlbXBlc3QudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MzAzMjI3NDc5NDksIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-04-29:

Also seeing this in the gate:

http://logs.openstack.org/40/164340/4/gate/gate-tempest-dsvm-postgres-full/4ef4e88/logs/tempest.txt.gz?level=TRACE#_2015-04-24_17_27_55_963

2015-04-24 17:27:55.963 4074 ERROR tempest.thirdparty.boto.test [-] Failed to delete volume Volume:vol-00000001
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test Traceback (most recent call last):
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/test.py", line 527, in destroy_volume_wait
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test wait.re_search_wait(_volume_state, "available")
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/utils/wait.py", line 75, in re_search_wait
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test (dtime, regexp, text))
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test AssertionError: Pattern find timeout exceeded!(196s) While waiting for"available" pattern in "error"
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-04-29:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQXNzZXJ0aW9uRXJyb3I6IFBhdHRlcm4gZmluZCB0aW1lb3V0IGV4Y2VlZGVkXCIgQU5EIG1lc3NhZ2U6XCJXaGlsZSB3YWl0aW5nIGZvclxcXCJhdmFpbGFibGVcXFwiIHBhdHRlcm4gaW4gXFxcImVycm9yXFxcIlwiIEFORCB0YWdzOlwidGVtcGVzdC50eHRcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQzMDMyMzE0MTIwOSwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

4 hits in 7 days, check and gate, all failures.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-04-29:

Lots of errors in the cinder volume logs:

http://logs.openstack.org/40/164340/4/gate/gate-tempest-dsvm-postgres-full/4ef4e88/logs/screen-c-vol.txt.gz?level=TRACE

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-04-29:

This looks like a better query:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiVm9sdW1lIGdyb3VwIGZvciB1dWlkIG5vdCBmb3VuZFwiIEFORCB0YWdzOlwic2NyZWVuLWMtdm9sLnR4dFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDMwMzIzNDgwMjA0LCJtb2RlIjoiIiwiYW5hbHl6ZV9maWVsZCI6IiJ9

52 hits in 7 days, check and gate, all failures.

summary:	- boto tests randomly fail in resource_cleanup + boto tests randomly fail in resource_cleanup with "Failed to delete + volume"
no longer affects:	tempest

Matt Riedemann (mriedem) on 2015-04-29

Changed in cinder:
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

Mike Perez (thingee) wrote on 2015-04-29:

Reading the googles looks like it's recommend you try the same command twice. Well we tried that https://bugs.launchpad.net/cinder/+bug/1335905 and that apparently still isn't working for us.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-29: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/178872

Changed in cinder:
assignee:	nobody → John Griffith (john-griffith)
status:	Confirmed → In Progress

Revision history for this message

John Griffith (john-griffith) wrote on 2015-04-30:

@Mike
Please see my note/comment in the other bug. This is actually a different bug. The one that was addressed was issues on create, this is issues on Delete, specifically snapshot-delete.

also, of concern are the checksum errors appearing on the backing disks on a good number of the errors here.

Revision history for this message

Mike Perez (thingee) wrote on 2015-04-30:

@John Thanks John, my mistake! I removed the dup tag.

Revision history for this message

Eric Harney (eharney) wrote on 2015-04-30:

Yeah, I'm really wondering what the explanation is for:

2015-04-24 17:18:57.297 32728 TRACE cinder.brick.local_dev.lvm Stderr: u' /dev/vdb2: Checksum error\n Volume group "stack-volumes-lvmdriver-1" not found\n'

Also, does this happen for any target drivers other than tgtadm?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-30: Fix merged to cinder (master)

#10

Reviewed: https://review.openstack.org/178872
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=b813f3f807019fa4e87de644674fc41051b63118
Submitter: Jenkins
Branch: master

commit b813f3f807019fa4e87de644674fc41051b63118
Author: John Griffith <email address hidden>
Date: Wed Apr 29 17:25:12 2015 -0600

Add retry to lvm delete

Seems we have another issue related to lvm and
snapshots, but this time it's on the delete side.

    We used a simple retry mechanism for snapshot create
    here:
      https://review.openstack.org/#/c/149360/8

    I'm hesitant to just add another retry without looking
    at dm issues, but this should address the problem
    temporarily and give us a chance to collect some
    data on the issue.

Change-Id: I37607a09e2f9af6f2f32ddfaa8567400541e533a
Closes-Bug: #1450110

Changed in cinder:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2015-06-23

Changed in cinder:
milestone:	none → liberty-1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-10-15

Changed in cinder:
milestone:	liberty-1 → 7.0.0

John Griffith (john-griffith) on 2016-07-26

Changed in cinder:
assignee:	John Griffith (john-griffith) → nobody

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.