boto tests randomly fail in resource_cleanup with "Failed to delete volume"

Bug #1450110 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Unassigned

Bug Description

http://logs.openstack.org/80/174480/15/check/check-tempest-dsvm-full/27b3bb4/console.html#_2015-04-29_15_26_48_298

2015-04-29 15:26:48.298 | tearDownClass (tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest)
2015-04-29 15:26:48.298 | -----------------------------------------------------------------------------
2015-04-29 15:26:48.298 |
2015-04-29 15:26:48.299 | Captured traceback:
2015-04-29 15:26:48.299 | ~~~~~~~~~~~~~~~~~~~
2015-04-29 15:26:48.299 | Traceback (most recent call last):
2015-04-29 15:26:48.299 | File "tempest/test.py", line 288, in tearDownClass
2015-04-29 15:26:48.300 | teardown()
2015-04-29 15:26:48.300 | File "tempest/thirdparty/boto/test.py", line 283, in resource_cleanup
2015-04-29 15:26:48.300 | raise exceptions.TearDownException(num=fail_count)
2015-04-29 15:26:48.301 | tempest.exceptions.TearDownException: 1 cleanUp operation failed

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRmlsZSBcXFwidGVtcGVzdC90aGlyZHBhcnR5L2JvdG8vdGVzdC5weVxcXCJcIiBBTkQgbWVzc2FnZTpcImluIHJlc291cmNlX2NsZWFudXBcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQzMDMyMjY0NzIwNn0=

The tempest log has a more specific error:

http://logs.openstack.org/80/174480/15/check/check-tempest-dsvm-full/27b3bb4/logs/tempest.txt.gz?level=TRACE#_2015-04-29_15_22_12_234

2015-04-29 15:22:12.234 5359 ERROR tempest.thirdparty.boto.test [-] Cleanup failed APIClientEC2.deregister_image(aki-00000010)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test Traceback (most recent call last):
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/test.py", line 272, in resource_cleanup
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test function(*pos_args, **kw_args)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "tempest/services/botoclients.py", line 68, in func
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test return getattr(conn, name)(*args, **kwargs)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/boto/ec2/connection.py", line 394, in deregister_image
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test params, verb='POST')
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/boto/connection.py", line 1227, in get_status
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test raise self.ResponseError(response.status, response.reason, body)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test EC2ResponseError: EC2ResponseError: 404 Not Found
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test <?xml version="1.0"?>
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test <Response><Errors><Error><Code>ImageNotFound</Code><Message>Image 19f1e137-3765-48ef-a57e-ab166d26d746 could not be found.</Message></Error></Errors><RequestID>req-5cc6958d-3ae1-40f1-9b8e-3afd08c7d494</RequestID></Response>
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test
2015-04-29 15:22:12.725 5359 ERROR tempest.test [-] teardown of resources failed: 1 cleanUp operation failed
2015-04-29 15:22:12.725 5359 TRACE tempest.test Traceback (most recent call last):
2015-04-29 15:22:12.725 5359 TRACE tempest.test File "tempest/test.py", line 288, in tearDownClass
2015-04-29 15:22:12.725 5359 TRACE tempest.test teardown()
2015-04-29 15:22:12.725 5359 TRACE tempest.test File "tempest/thirdparty/boto/test.py", line 283, in resource_cleanup
2015-04-29 15:22:12.725 5359 TRACE tempest.test raise exceptions.TearDownException(num=fail_count)
2015-04-29 15:22:12.725 5359 TRACE tempest.test TearDownException: 1 cleanUp operation failed
2015-04-29 15:22:12.725 5359 TRACE tempest.test

Which doesn't show up in logstash as much:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQ2xlYW51cCBmYWlsZWQgQVBJQ2xpZW50RUMyLmRlcmVnaXN0ZXJfaW1hZ2VcIiBBTkQgdGFnczpcInRlbXBlc3QudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MzAzMjI3NDc5NDksIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

Tags: boto
Revision history for this message
Matt Riedemann (mriedem) wrote :

Also seeing this in the gate:

http://logs.openstack.org/40/164340/4/gate/gate-tempest-dsvm-postgres-full/4ef4e88/logs/tempest.txt.gz?level=TRACE#_2015-04-24_17_27_55_963

2015-04-24 17:27:55.963 4074 ERROR tempest.thirdparty.boto.test [-] Failed to delete volume Volume:vol-00000001
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test Traceback (most recent call last):
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/test.py", line 527, in destroy_volume_wait
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test wait.re_search_wait(_volume_state, "available")
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/utils/wait.py", line 75, in re_search_wait
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test (dtime, regexp, text))
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test AssertionError: Pattern find timeout exceeded!(196s) While waiting for"available" pattern in "error"
2015-04-24 17:27:55.963 4074 TRACE tempest.thirdparty.boto.test

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
summary: - boto tests randomly fail in resource_cleanup
+ boto tests randomly fail in resource_cleanup with "Failed to delete
+ volume"
no longer affects: tempest
Matt Riedemann (mriedem)
Changed in cinder:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Mike Perez (thingee) wrote :

Reading the googles looks like it's recommend you try the same command twice. Well we tried that https://bugs.launchpad.net/cinder/+bug/1335905 and that apparently still isn't working for us.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/178872

Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
status: Confirmed → In Progress
Revision history for this message
John Griffith (john-griffith) wrote :

@Mike
Please see my note/comment in the other bug. This is actually a different bug. The one that was addressed was issues on create, this is issues on Delete, specifically snapshot-delete.

also, of concern are the checksum errors appearing on the backing disks on a good number of the errors here.

Revision history for this message
Mike Perez (thingee) wrote :

@John Thanks John, my mistake! I removed the dup tag.

Revision history for this message
Eric Harney (eharney) wrote :

Yeah, I'm really wondering what the explanation is for:

2015-04-24 17:18:57.297 32728 TRACE cinder.brick.local_dev.lvm Stderr: u' /dev/vdb2: Checksum error\n Volume group "stack-volumes-lvmdriver-1" not found\n'

Also, does this happen for any target drivers other than tgtadm?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/178872
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=b813f3f807019fa4e87de644674fc41051b63118
Submitter: Jenkins
Branch: master

commit b813f3f807019fa4e87de644674fc41051b63118
Author: John Griffith <email address hidden>
Date: Wed Apr 29 17:25:12 2015 -0600

    Add retry to lvm delete

    Seems we have another issue related to lvm and
    snapshots, but this time it's on the delete side.

    We used a simple retry mechanism for snapshot create
    here:
      https://review.openstack.org/#/c/149360/8

    I'm hesitant to just add another retry without looking
    at dm issues, but this should address the problem
    temporarily and give us a chance to collect some
    data on the issue.

    Change-Id: I37607a09e2f9af6f2f32ddfaa8567400541e533a
    Closes-Bug: #1450110

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-1 → 7.0.0
Changed in cinder:
assignee: John Griffith (john-griffith) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.