Volume group for uuid not found

Bug #1335905 reported by Sean Dague
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith

Bug Description

There is a periodically occurring race condition on Volumes that exposes as the following error in Cinder Volume service in the gate:

2014-06-30 14:07:17.205 19616 ERROR cinder.brick.local_dev.lvm [req-3f6d886d-cf19-4fe8-a917-d0ff6c97940b dd9fe73bc43e440fab74983df09b4888 2a9f335f064a4466b37daddbb1a1ff7e - - -] Cmd :sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvcreate --name _snapshot-207e0ef0-dc73-4c1e-955f-375358fa7b51 --snapshot stack-volumes/volume-13b6dbcf-e414-4939-bfce-1eea3b272f4f -L 1.00g
2014-06-30 14:07:17.205 19616 ERROR cinder.brick.local_dev.lvm [req-3f6d886d-cf19-4fe8-a917-d0ff6c97940b dd9fe73bc43e440fab74983df09b4888 2a9f335f064a4466b37daddbb1a1ff7e - - -] StdOut :
2014-06-30 14:07:17.205 19616 ERROR cinder.brick.local_dev.lvm [req-3f6d886d-cf19-4fe8-a917-d0ff6c97940b dd9fe73bc43e440fab74983df09b4888 2a9f335f064a4466b37daddbb1a1ff7e - - -] StdErr : /dev/loop1: Checksum error
  Volume group for uuid not found: JddPC4d4gvMd9RxcWM5Tzqn8W6Jh8BtbPaCSTd1yKHMK3chelWx8EFdNx0R4Ag0x
  Problem reactivating origin volume-13b6dbcf-e414-4939-bfce-1eea3b272f4f
  libdevmapper exiting with 2 device(s) still suspended.

2014-06-30 14:07:17.236 19616 ERROR oslo.messaging.rpc.dispatcher [req-3f6d886d-cf19-4fe8-a917-d0ff6c97940b dd9fe73bc43e440fab74983df09b4888 2a9f335f064a4466b37daddbb1a1ff7e - - -] Exception during message handling: Unexpected error while running command.
Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvcreate --name _snapshot-207e0ef0-dc73-4c1e-955f-375358fa7b51 --snapshot stack-volumes/volume-13b6dbcf-e414-4939-bfce-1eea3b272f4f -L 1.00g
Exit code: 5
Stdout: ''
Stderr: ' /dev/loop1: Checksum error\n Volume group for uuid not found: JddPC4d4gvMd9RxcWM5Tzqn8W6Jh8BtbPaCSTd1yKHMK3chelWx8EFdNx0R4Ag0x\n Problem reactivating origin volume-13b6dbcf-e414-4939-bfce-1eea3b272f4f\n libdevmapper exiting with 2 device(s) still suspended.\n'

Error log here - http://logs.openstack.org/35/103535/3/gate/gate-tempest-dsvm-full/08c8ed9/logs/screen-c-vol.txt.gz?level=TRACE#_2014-06-30_14_07_17_205

Tags: gate-failure
Changed in cinder:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Boris Pavlovic (boris-42) wrote :
Mike Perez (thingee)
Changed in cinder:
milestone: none → kilo-2
Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
Revision history for this message
John Griffith (john-griffith) wrote :

My bad... I was confusing this with the old "delete snapshot", but this is on create; and a checksum error on /dev/loop0 that's causing the error

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/149360

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/149360
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e302cefafc18754a4495ad061062732b97d326ea
Submitter: Jenkins
Branch: master

commit e302cefafc18754a4495ad061062732b97d326ea
Author: John Griffith <email address hidden>
Date: Thu Jan 22 11:22:25 2015 -0700

    Add retry to lvm snapshot create

    We have some occasional issues with snapshot-create
    failing for what looks to be conflicts with udev. It
    looks like this problem is a status conflict between LVM
    cache and udev, and in most cases the best way to get
    around this is to retry the command a few times until
    the cache and udev are back in sync.

    This patch uses the newly added retry decorator and
    for now we're just adding it to the snapshot create
    call. We're using the default values for interval and
    retry count but we can certainly adjust this as needed.

    Change-Id: Ic884b829591361da979fde968f5710237b34565f
    Closes-Bug: #1335905

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Revision history for this message
John Griffith (john-griffith) wrote :

Looks like we should probably backport this to Juno as well

Revision history for this message
Mike Perez (thingee) wrote :
Download full text (4.2 KiB)

Nope, still broken!

From dup bug:

http://logs.openstack.org/80/174480/15/check/check-tempest-dsvm-full/27b3bb4/console.html#_2015-04-29_15_26_48_298

2015-04-29 15:26:48.298 | tearDownClass (tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest)
2015-04-29 15:26:48.298 | -----------------------------------------------------------------------------
2015-04-29 15:26:48.298 |
2015-04-29 15:26:48.299 | Captured traceback:
2015-04-29 15:26:48.299 | ~~~~~~~~~~~~~~~~~~~
2015-04-29 15:26:48.299 | Traceback (most recent call last):
2015-04-29 15:26:48.299 | File "tempest/test.py", line 288, in tearDownClass
2015-04-29 15:26:48.300 | teardown()
2015-04-29 15:26:48.300 | File "tempest/thirdparty/boto/test.py", line 283, in resource_cleanup
2015-04-29 15:26:48.300 | raise exceptions.TearDownException(num=fail_count)
2015-04-29 15:26:48.301 | tempest.exceptions.TearDownException: 1 cleanUp operation failed

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRmlsZSBcXFwidGVtcGVzdC90aGlyZHBhcnR5L2JvdG8vdGVzdC5weVxcXCJcIiBBTkQgbWVzc2FnZTpcImluIHJlc291cmNlX2NsZWFudXBcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQzMDMyMjY0NzIwNn0=

The tempest log has a more specific error:

http://logs.openstack.org/80/174480/15/check/check-tempest-dsvm-full/27b3bb4/logs/tempest.txt.gz?level=TRACE#_2015-04-29_15_22_12_234

2015-04-29 15:22:12.234 5359 ERROR tempest.thirdparty.boto.test [-] Cleanup failed APIClientEC2.deregister_image(aki-00000010)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test Traceback (most recent call last):
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "tempest/thirdparty/boto/test.py", line 272, in resource_cleanup
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test function(*pos_args, **kw_args)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "tempest/services/botoclients.py", line 68, in func
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test return getattr(conn, name)(*args, **kwargs)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/boto/ec2/connection.py", line 394, in deregister_image
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test params, verb='POST')
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/boto/connection.py", line 1227, in get_status
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test raise self.ResponseError(response.status, response.reason, body)
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test EC2ResponseError: EC2ResponseError: 404 Not Found
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test <?xml version="1.0"?>
2015-04-29 15:22:12.234 5359 TRACE tempest.thirdparty.boto.test <Response><Errors><Error><Code>ImageNotFound</Code><Message>Image 19f1e137-3765-48ef-a57e-ab166d26d746 could not be found.</Message></Error></Errors><RequestID>req-5cc6958d-3ae1-40f1-9b8e-3afd08c7d494</RequestID></Response>
2015-04-29 15:22:...

Read more...

Changed in cinder:
status: Fix Released → Confirmed
milestone: kilo-2 → liberty-1
Mike Perez (thingee)
tags: added: gate-failure
Revision history for this message
John Griffith (john-griffith) wrote :

@Mike
This is a different bug, this issue was for snapshot-create, the issue being seen currently that you're marking as "still broken" and dup is: 1450110 which is specifically on snapshot-delete.

Two separate issues IMO

Revision history for this message
Mike Perez (thingee) wrote :

@John Ok fixed, thanks!

Changed in cinder:
status: Confirmed → Fix Released
milestone: liberty-1 → none
no longer affects: cinder/kilo
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.