Bug #1327218 “Volume detach failure because of invalid bdm.conne...” : Bugs : OpenStack Compute (nova)

Revision history for this message

Mark McLoughlin (markmc) wrote on 2014-06-06:

#1

Looks in the same ballpark as bug #1302774

description:	updated
description:	updated

Revision history for this message

Mark McLoughlin (markmc) wrote on 2014-06-06:

#2

Here's a case of the DiskNotFound traceback from bug #1302774 and the traceback from this bug in the same log:

http://logs.openstack.org/33/96333/2/check/check-grenade-dsvm/105af93/logs/old/screen-n-cpu.txt.gz?level=TRACE

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-06-12:

#3

Looks like that this did not get fixed by the cinder fix https://review.openstack.org/#/c/90353/ looking at logstash. The fix landed on June 10th but we have a hit on June 11th.

Let's monitor this a bit more to be sure, but seems there are more races in there.

Revision history for this message

Joe Gordon (jogo) wrote on 2014-08-29:

#4

No hits for this one, marking as resolved

Changed in nova:
status:	Triaged → Fix Committed

Thierry Carrez (ttx) on 2014-09-05

Changed in nova:
milestone:	none → juno-3
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in nova:
milestone:	juno-3 → 2014.2

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-03-13:

#5

This is still an issue and from what I can tell a specific change wasn't merged against this bug, so re-opening since I couldn't find it via LP search before (since it was Fix Committed):

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_38_17_567

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2RldGFjaF92b2x1bWVcIiBBTkQgbWVzc2FnZTpcImNhblxcJ3QgYmUgZGVjb2RlZFwiIEFORCB0YWdzOlwic2NyZWVuLW4tY3B1LnR4dFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDI2MjYyNTc2ODI4LCJtb2RlIjoiIiwiYW5hbHl6ZV9maWVsZCI6IiJ9

Changed in nova:
status:	Fix Released → Confirmed
milestone:	2014.2 → none

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-03-13:

#6

e-r query: https://review.openstack.org/#/c/164235/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-03-13:

#7

In one case, we're attaching the encrypted luks volume to the instance here:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_38_09_061

We initialize the connection and get the connection_info back here:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_38_11_064

I see an os-attach call here:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_38_15_223

We start detaching the volume here:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_38_16_902

We're failing to detach the volume here:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_38_17_567

And six minutes later we're terminating the bdm for that volume here:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_44_54_876

After failing to detach, I'm also seeing the same volume_id showing up in the logs in other test runs:

VolumesV1SnapshotTestJSON:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_42_03_507

TestMinimumBasicScenario:

http://logs.openstack.org/93/156693/7/check/check-tempest-dsvm-postgres-full/d3b26e8/logs/screen-n-cpu.txt.gz#_2015-03-12_16_44_40_119

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-03-13:

#8

This should help with some of the confusion when tracing throug the n-cpu debug logs:

https://review.openstack.org/#/c/164259/

The test that fails is for cryptsetup but the volume type had 'luks' in the name which is confusing.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-13: Fix proposed to nova (master)

#9

Fix proposed to branch: master
Review: https://review.openstack.org/164330

Changed in nova:
assignee:	nobody → Matt Riedemann (mriedem)
status:	Confirmed → In Progress

Matt Riedemann (mriedem) on 2015-03-13

tags:

added: juno-backport-potential

Revision history for this message

haruka tanizawa (h-tanizawa) wrote on 2015-03-16:

#10

This patch https://review.openstack.org/#/c/163937/ also seems to avoid NULL of connection_info.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-19: Fix merged to nova (master)

#11

Reviewed: https://review.openstack.org/164330
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6fb2ef96d6aaf9ca0ad394fd7621ef1e6003f5a1
Submitter: Jenkins
Branch: master

commit 6fb2ef96d6aaf9ca0ad394fd7621ef1e6003f5a1
Author: Matt Riedemann <email address hidden>
Date: Wed Mar 18 12:42:42 2015 -0700

Save bdm.connection_info before calling volume_api.attach_volume

    There is a race in attach/detach of a volume where the volume status
    goes to 'in-use' before the bdm.connection_info data is stored in the
    database. Since attach is a cast, the caller can see the volume go to
    'in-use' and immediately try to detach the volume and blow up in the
    compute manager because bdm.connection_info isn't set stored in the
    database.

    This fixes the issue by saving the connection_info immediately before
    calling volume_api.attach_volume (which sets the volume status to
    'in-use').

Closes-Bug: #1327218

Change-Id: Ib95c8f7b66aca0c4ac7b92d140cbeb5e85c2717f

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-19: Fix proposed to nova (stable/juno)

#12

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/166017

Thierry Carrez (ttx) on 2015-03-20

Changed in nova:
milestone:	none → kilo-3
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-16: Fix merged to nova (stable/juno)

#13

Reviewed: https://review.openstack.org/166017
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bbf6348997fee02f9dadd556565f44005e2c7f23
Submitter: Jenkins
Branch: stable/juno

commit bbf6348997fee02f9dadd556565f44005e2c7f23
Author: Matt Riedemann <email address hidden>
Date: Wed Mar 18 12:42:42 2015 -0700

Save bdm.connection_info before calling volume_api.attach_volume

    There is a race in attach/detach of a volume where the volume status
    goes to 'in-use' before the bdm.connection_info data is stored in the
    database. Since attach is a cast, the caller can see the volume go to
    'in-use' and immediately try to detach the volume and blow up in the
    compute manager because bdm.connection_info isn't set stored in the
    database.

    This fixes the issue by saving the connection_info immediately before
    calling volume_api.attach_volume (which sets the volume status to
    'in-use').

Closes-Bug: #1327218

    Conflicts:
            nova/tests/unit/compute/test_compute.py
            nova/tests/unit/virt/test_block_device.py
            nova/virt/block_device.py

    NOTE(mriedem): The block_device conflicts are due to using dot
    notation when accessing object fields and in kilo the context is
    no longer passed to bdm.save(). The test conflicts are due to moving
    the test modules in kilo and passing the context on save().

Change-Id: Ib95c8f7b66aca0c4ac7b92d140cbeb5e85c2717f
(cherry picked from commit 6fb2ef96d6aaf9ca0ad394fd7621ef1e6003f5a1)

tags:

added: in-stable-juno

Thierry Carrez (ttx) on 2015-04-30

Changed in nova:
milestone:	kilo-3 → 2015.1.0

Edward Hope-Morley (hopem) on 2015-07-24

Changed in nova (Ubuntu Trusty):
assignee:	nobody → Edward Hope-Morley (hopem)
importance:	Undecided → High

Revision history for this message

Xi Yang (xi-yang) wrote on 2015-08-13:

#14

Download full text (5.0 KiB)

I am not sure whether this bug is fixed. I can reproduce it on master branch.

my test environment is setup with commit 172d4a00ce609da7ea6d8d97f635e6c9afecb373.

2015-08-12 23:23:15.793 INFO nova.compute.manager [req-6e6eb653-cb52-47a2-93e5-d520633e4e10 admin admin] [instance: 0030bb4e-10f3-4300-9ab7-2a2bd609678f] Detach volume fe82d173-1500-4bfb-a541-3046b46c8be0 from mountpoint /dev/vdb
----BlockDeviceMapping(boot_index=None,connection_info=None,created_at=2015-08-12T13:38:16Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=2685,image_id=None,instance=<?>,instance_uuid=0030bb4e-10f3-4300-9ab7-2a2bd609678f,no_device=False,snapshot_id=None,source_type='volume',updated_at=None,volume_id='fe82d173-1500-4bfb-a541-3046b46c8be0',volume_size=None) --- (This is printed by me)
2015-08-12 23:23:15.808 ERROR oslo_messaging.rpc.dispatcher [req-6e6eb653-cb52-47a2-93e5-d520633e4e10 admin admin] Exception during message handling: <type 'NoneType'> can't be decoded
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher executor_callback))
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher executor_callback)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 89, in wrapped
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher payload)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 72, in wrapped
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 364, in decorated_function
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dis...

I am not sure whether this bug is fixed. I can reproduce it on master branch.

my test environment is setup with commit 172d4a00ce609da7ea6d8d97f635e6c9afecb373.

2015-08-12 23:23:15.793 INFO nova.compute.manager [req-6e6eb653-cb52-47a2-93e5-d520633e4e10 admin admin] [instance: 0030bb4e-10f3-4300-9ab7-2a2bd609678f] Detach volume fe82d173-1500-4bfb-a541-3046b46c8be0 from mountpoint /dev/vdb
 ----BlockDeviceMapping(boot_index=None,connection_info=None,created_at=2015-08-12T13:38:16Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=2685,image_id=None,instance=<?>,instance_uuid=0030bb4e-10f3-4300-9ab7-2a2bd609678f,no_device=False,snapshot_id=None,source_type='volume',updated_at=None,volume_id='fe82d173-1500-4bfb-a541-3046b46c8be0',volume_size=None) --- (This is printed by me)
2015-08-12 23:23:15.808 ERROR oslo_messaging.rpc.dispatcher [req-6e6eb653-cb52-47a2-93e5-d520633e4e10 admin admin] Exception during message handling: <type 'NoneType'> can't be decoded
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     executor_callback))
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     executor_callback)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/exception.py", line 89, in wrapped
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     payload)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/exception.py", line 72, in wrapped
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 364, in decorated_function
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     kwargs['instance'], e, sys.exc_info())
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 119, in __exit__
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 352, in decorated_function
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 4594, in detach_volume
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     self._detach_volume(context, volume_id, instance)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 4577, in _detach_volume
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     self._driver_detach_volume(context, instance, bdm)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/opt/stack/nova/nova/compute/manager.py", line 4512, in _driver_detach_volume
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     connection_info = jsonutils.loads(bdm.connection_info)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 214, in loads
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher     raise TypeError("%s can't be decoded" % type(text))
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded
2015-08-12 23:23:15.808 136462 ERROR oslo_messaging.rpc.dispatcher

Revision history for this message

Liam Young (gnuoy) wrote on 2015-09-07:

#15

The fix went into 2015.1.0 and 2015.1.1 is now in the cloud archive.

Changed in nova (Ubuntu):
status:	New → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-09-18:

#16

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu Trusty):
status:	New → Confirmed

Revision history for this message

George (lmihaiescu) wrote on 2015-09-18:

#17

Do you know when/if this will be backported to Juno?

Thank you.

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2015-09-20:

#18

@lmihaiescu Hi, this patch is already in stable/juno and as such will be included in the next point release of Juno (2014.2.4) but is not yet targeted for SRU into Juno.

Changed in nova (Ubuntu Trusty):
assignee:	Edward Hope-Morley (hopem) → nobody

OpenStack Compute (nova)

Volume detach failure because of invalid bdm.connection_info

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to	Milestone
OpenStack Compute (nova)	Fix Released	High	Matt Riedemann	OpenStack Compute (nova) 2015.1.0 "kilo"
Juno	Fix Released	Undecided	Unassigned	OpenStack Compute (nova) 2014.2.4
nova (Ubuntu)	Fix Released	Undecided	Unassigned
Trusty	Confirmed	High	Unassigned