Detached and deleted RBD volumes remain associated with instance

Bug #1083818 reported by Adam Gandelman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Adam Gandelman
Folsom
Fix Released
Critical
Adam Gandelman
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
Folsom
Fix Released
Critical
Adam Gandelman
cinder (Ubuntu)
Fix Released
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Quantal
Fix Released
Undecided
Unassigned

Bug Description

Using the RBD driver with nova-volume, volumes can be created and attached to instances okay. However, once they have been detached and deleted, an association between the instance the volume remains, causing issues for the EC2 API.

$ euca-run-instances -t m1.tiny ami-01
$ euca-create-volume -s 1 -z nova
$ euca-describe-instances
$ euca-attach-volume -i i-014 vol-0f -d /dev/vdc
$ euca-describe-volumes
$ euca-describe-instances
VolumeNotFound: Volume vol-0000000f could not be found.

The equivalent OSAPI commands can be used to trigger the same end-result (a broken euca-describe-instance). 'nova list' and 'nova show $server_id' work okay, but instances can no longer be queried using the EC2 API.

Shortly after the volume has been detached from the instance, the nova-volume.log shows the following trace:

2012-11-27 17:42:45 8727 ERROR nova.openstack.common.rpc.amqp [-] Exception during message handling
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 276, in _process_data
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/volume/manager.py", line 316, in detach_volume
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp if volume_ref['name'] not in volume_ref['provider_location']:
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp TypeError: argument of type 'NoneType' is not iterable
2012-11-27 17:42:45 8727 TRACE nova.openstack.common.rpc.amqp
2012-11-27 17:42:45 8727 ERROR nova.openstack.common.rpc.common [-] Returning exception argument of type 'NoneType' is not iterable to caller

The nova-api-ec2.log for describe_instances request shows:

2012-11-27 17:55:33 DEBUG nova.api.ec2 [req-ec9747d4-2e62-4143-9e02-b4175f709733 93027170f4ea40d3ae102bc62b105930 db00f4c9f11d4f78b7774cefedde86c8] action: DescribeInstances __call__ /usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py:328
2012-11-27 17:55:33 DEBUG nova.compute.api [req-ec9747d4-2e62-4143-9e02-b4175f709733 93027170f4ea40d3ae102bc62b105930 db00f4c9f11d4f78b7774cefedde86c8] Searching by: {'deleted': False} get_all /usr/lib/python2.7/dist-packages/nova/compute/api.py:1109
2012-11-27 17:55:33 INFO nova.api.ec2 [req-ec9747d4-2e62-4143-9e02-b4175f709733 93027170f4ea40d3ae102bc62b105930 db00f4c9f11d4f78b7774cefedde86c8] VolumeNotFound raised: Volume d33d8a21-f5e6-409f-90e5-b9ba119dba8c could not be found.
2012-11-27 17:55:33 ERROR nova.api.ec2 [req-ec9747d4-2e62-4143-9e02-b4175f709733 93027170f4ea40d3ae102bc62b105930 db00f4c9f11d4f78b7774cefedde86c8] VolumeNotFound: Volume vol-0000000f could not be found.
2012-11-27 17:55:33 INFO nova.api.ec2 [req-ec9747d4-2e62-4143-9e02-b4175f709733 93027170f4ea40d3ae102bc62b105930 db00f4c9f11d4f78b7774cefedde86c8] 0.162915s 192.168.20.1 POST /services/Cloud/ CloudController:DescribeInstances 400 [Boto/2.3.0 (linux2)] application/x-www-form-urlencoded text/xml
2012-11-27 17:55:33 INFO nova.ec2.wsgi.server [req-ec9747d4-2e62-4143-9e02-b4175f709733 93027170f4ea40d3ae102bc62b105930 db00f4c9f11d4f78b7774cefedde86c8] 192.168.20.1 - - [27/Nov/2012 17:55:33] "POST /services/Cloud/ HTTP/1.1" 400 333 0.163696

At this point, there is an entry in the block_device_mapping table that associates the instance and old volume, but it is not marked deleted (the deleted column is still '0').

Found when testing the current stable/folsom in preparation for 2012.2.1. Have not tested using Grizzly / master.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Bisected back to e3d7f8c7de9bad73bf1f9b5ee9b2cf46eb452351 / https://review.openstack.org/#/c/15005/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/17023

Changed in cinder:
assignee: nobody → Adam Gandelman (gandelman-a)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/17024

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/17023
Committed: http://github.com/openstack/cinder/commit/d030c5b10e9e8d73d967562259e7db6146347108
Submitter: Jenkins
Branch: master

commit d030c5b10e9e8d73d967562259e7db6146347108
Author: Adam Gandelman <email address hidden>
Date: Tue Nov 27 17:51:26 2012 -0800

    Improve provider_location cleanup code for RBD.

    The RBD driver does not make use of the 'provider_location' field
    but the current cleanup code assumes it does. Ensure the field
    is in use before testing whether or not it needs fixing.

    Fixes bug 1083818.

    Change-Id: Id6ff85101f85e70575ba244c2df7aca0196cf224

Changed in cinder:
status: In Progress → Fix Committed
Mark McLoughlin (markmc)
Changed in nova:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/folsom)

Reviewed: https://review.openstack.org/17024
Committed: http://github.com/openstack/nova/commit/e7877868b13ff615feb61a2ac229b160d1f51283
Submitter: Jenkins
Branch: stable/folsom

commit e7877868b13ff615feb61a2ac229b160d1f51283
Author: Adam Gandelman <email address hidden>
Date: Tue Nov 27 17:44:27 2012 -0800

    Improve provider_location cleanup code for RBD.

    The RBD driver does not make use of the 'provider_location' field
    but the current cleanup code assumes it does. Ensure the field
    is in use before testing whether or not it needs fixing.

    Fixes bug 1083818.

    Note: Applying to stable/folsom nova-volume code. Same fix
    proposed to cinder @ https://review.openstack.org/#/c/17023/

    Change-Id: I814bd253d2e850534a673985e499f8cdbcebb18e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/folsom)

Fix proposed to branch: stable/folsom
Review: https://review.openstack.org/17069

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/folsom)

Reviewed: https://review.openstack.org/17069
Committed: http://github.com/openstack/cinder/commit/940f363dafd8a511e0c37b8a1ce1370e36c5a835
Submitter: Jenkins
Branch: stable/folsom

commit 940f363dafd8a511e0c37b8a1ce1370e36c5a835
Author: Adam Gandelman <email address hidden>
Date: Tue Nov 27 17:51:26 2012 -0800

    Improve provider_location cleanup code for RBD.

    The RBD driver does not make use of the 'provider_location' field
    but the current cleanup code assumes it does. Ensure the field
    is in use before testing whether or not it needs fixing.

    Fixes bug 1083818.

    Change-Id: Id6ff85101f85e70575ba244c2df7aca0196cf224
    (cherry picked from commit d030c5b10e9e8d73d967562259e7db6146347108)

Changed in cinder (Ubuntu):
status: New → Fix Released
Changed in cinder (Ubuntu Quantal):
status: New → Confirmed
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Quantal):
status: New → Confirmed
summary: - Detached and deleted RBD volumes remain associated with insance
+ Detached and deleted RBD volumes remain associated with instance
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Adam, or anyone else affected,

Accepted cinder into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/cinder/2012.2.1-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cinder (Ubuntu Quantal):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hello Adam, or anyone else affected,

Accepted nova into quantal-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.2.1+stable-20121212-a99a802e-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Quantal):
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → grizzly-2
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cinder - 2012.2.1-0ubuntu1

---------------
cinder (2012.2.1-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - Cinder should suggest ceph-common, not python-ceph (LP: #1065901):
      - debian/control: cinder-volume Suggests: python-ceph -> ceph-common
  * Resynchronize with stable/folsom (87d839a5) (LP: #1085255):
    - [f990ff0] Remove unused python-daemon dependency
    - [940f363] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [7f34ba3] After folsom upgrade, instances can no longer access existing
      volumes. (LP: #1065702)
    - [1c99b24] Jenkins jobs fail because of incompatibility between sqlalchemy-
      migrate and the newest sqlalchemy-0.8.0b1 (LP: #1073569)
    - [d12d4b6] Add SIGPIPE handler to subprocess execution in rootwrap and
      utils.execute (LP: #1053364)
    - [ce5e002] Set defaultbranch in .gitreview to stable/folsom
 -- Adam Gandelman <email address hidden> Tue, 04 Dec 2012 09:19:29 -0800

Changed in cinder (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.3 KiB)

This bug was fixed in the package nova - 2012.2.1+stable-20121212-a99a802e-0ubuntu1

---------------
nova (2012.2.1+stable-20121212-a99a802e-0ubuntu1) quantal-proposed; urgency=low

  * Ubuntu updates:
    - debian/control: Ensure novaclient is upgraded with nova,
      require python-keystoneclient >= 1:2.9.0. (LP: #1073289)
    - d/p/avoid_setuptools_git_dependency.patch: Refresh.
  * Dropped patches, applied upstream:
    - debian/patches/CVE-2012-5625.patch: [a99a802]
  * Resynchronize with stable/folsom (b55014ca) (LP: #1085255):
    - [a99a802] create_lvm_image allocates dirty blocks (LP: #1070539)
    - [670b388] RPC exchange name defaults to 'openstack' (LP: #1083944)
    - [3ede373] disassociate_floating_ip with multi_host=True fails
      (LP: #1074437)
    - [22d7c3b] libvirt imagecache should handle shared image storage
      (LP: #1075018)
    - [e787786] Detached and deleted RBD volumes remain associated with insance
      (LP: #1083818)
    - [9265eb0] live_migration missing migrate_data parameter in Hyper-V driver
      (LP: #1066513)
    - [3d99848] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [65a2d0a] resize does not migrate DHCP host information (LP: #1065440)
    - [102c76b] Nova backup image fails (LP: #1065053)
    - [48a3521] Fix config-file overrides for nova-dhcpbridge
    - [69663ee] Cloudpipe in Folsom: no such option: cnt_vpn_clients
      (LP: #1069573)
    - [6e47cc8] DisassociateAddress can cause Internal Server Error
      (LP: #1080406)
    - [22c3d7b] API calls to dis-associate an auto-assigned floating IP should
      return proper warning (LP: #1061499)
    - [bd11d15] libvirt: if exception raised during volume_detach, volume state
      is inconsistent (LP: #1057756)
    - [dcb59c3] admin can't describe all images in ec2 api (LP: #1070138)
    - [78de622] Incorrect Exception raised during Create server when metadata
      over 255 characters (LP: #1004007)
    - [c313de4] Fixed IP isn't released before updating DHCP host file
      (LP: #1078718)
    - [f4ab42d] Enabling Return Reservation ID with XML create server request
      returns no body (LP: #1061124)
    - [3db2a38] 'BackupCreate' should accept rotation parameter greater than or
      equal to zero (LP: #1071168)
    - [f7e5dde] libvirt reboot sometimes fails to reattach volumes
      (LP: #1073720)
    - [ff776d4] libvirt: detaching volume may fail while terminating other
      instances on the same host concurrently (LP: #1060836)
    - [85a8bc2] Used instance uuid rather than id in remove-fixed-ip
    - [42a85c0] Fix error on invalid delete_on_termination value
    - [6a17579] xenapi migrations fail w/ swap (LP: #1064083)
    - [97649b8] attach-time field for volumes is not updated for detach volume
      (LP: #1056122)
    - [8f6a718] libvirt: rebuild is not using kernel and ramdisk associated with
      the new image (LP: #1060925)
    - [fbe835f] live-migration and volume host assignement (LP: #1066887)
    - [c2a9150] typo prevents volume_tmp_dir flag from working (LP: #1071536)
    - [93efa21] Instances deleted during spawn leak network allocations
      (LP: #1068716)
    - [ebabd02] After restartin...

Read more...

Changed in nova (Ubuntu Quantal):
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: grizzly-2 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.