Multiple attempts to swap volumes using volume-update fail

Bug #1661016 reported by Lee Yarwood on 2017-02-01
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Lee Yarwood
Newton
Medium
Lee Yarwood
Ocata
Medium
Lee Yarwood

Bug Description

Description
===========
The second and any future attempts to swap volumes using volume-update fail due to a BDM lookup failure using the original volume id (see Logs & Configs for an example).

A previous attempt to fix this was made in bug#1490236 and reverted by bug#1625660.

Steps to reproduce
==================
- Boot an instance
- Create multiple volumes
- Attach a single volume
- Swap the attached volume with one that is unattached via nova volume-update.
- Swap the attached volume with one that is unattached via nova volume-update.

Expected result
===============
The second attempt succeeds and the new volume is now attached to the instance.

Actual result
=============
The second attempt fails looking up a BDM with the ID of the original volume.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/
   $ pwd
   /opt/stack/nova
   $ git rev-parse HEAD
   dae6b760b9c40bbf3b72a0218dbf1dbc823f30e2

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   Libvirt + KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   LVM/iSCSI

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   n/a

Logs & Configs
==============
$ nova boot --image cirros-0.3.4-x86_64-uec --flavor 1 test-boot
$ cinder create 1 ; cinder create 1

$ nova volume-attach ef426f1e-32e4-4f8c-a3fc-b58080d38294 \
                     23933e67-a4c0-4de9-b0dc-3da37bce1b78

$ nova volume-update ef426f1e-32e4-4f8c-a3fc-b58080d38294 \
                     23933e67-a4c0-4de9-b0dc-3da37bce1b78 \
                     cace165f-9c97-4d6d-a0e8-ea087fa80263

$ nova volume-update ef426f1e-32e4-4f8c-a3fc-b58080d38294 \
                     cace165f-9c97-4d6d-a0e8-ea087fa80263 \
                     23933e67-a4c0-4de9-b0dc-3da37bce1b78

n-cpu.log :

4448 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server Traceback (most recent call last):
4449 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming
4450 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
4451 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 222, in dispatch
4452 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
4453 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 192, in _do_dispatch
4454 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server result = func(ctxt, **new_args)
4455 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 75, in wrapped
4456 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server function_name, call_dict, binary)
4457 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
4458 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server self.force_reraise()
4459 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
4460 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
4461 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/exception_wrapper.py", line 66, in wrapped
4462 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server return f(self, context, *args, **kw)
4463 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 188, in decorated_function
4464 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server LOG.warning(msg, e, instance=instance)
4465 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
4466 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server self.force_reraise()
4467 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
4468 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
4469 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 157, in decorated_function
4470 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
4471 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 216, in decorated_function
4472 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
4473 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
4474 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server self.force_reraise()
4475 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
4476 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
4477 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 204, in decorated_function
4478 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
4479 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 5052, in swap_volume
4480 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server resize_to)
4481 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 4998, in _swap_volume
4482 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server self.volume_api.unreserve_volume(context, new_volume_id)
4483 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
4484 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server self.force_reraise()
4485 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
4486 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
4487 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/compute/manager.py", line 4976, in _swap_volume
4488 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server resize_to)
4489 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 1296, in swap_volume
4490 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server nova_context.get_admin_context(), volume_id, instance.uuid)
4491 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 177, in wrapper
4492 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server args, kwargs)
4493 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/rpcapi.py", line 239, in object_class_action_versions
4494 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server args=args, kwargs=kwargs)
4495 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
4496 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server retry=self.retry)
4497 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send
4498 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server timeout=timeout, retry=retry)
4499 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 458, in send
4500 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server retry=retry)
4501 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 449, in _send
4502 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server raise result
4503 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server VolumeBDMNotFound_Remote: No volume Block Device Mapping with id 23933e67-a4c0-4de9-b0dc-3da37bce1b78.
4504 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server Traceback (most recent call last):
4505 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server
4506 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/manager.py", line 92, in _object_dispatch
4507 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server return getattr(target, method)(*args, **kwargs)
4508 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server
4509 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper
4510 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs)
4511 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server
4512 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server File "/opt/stack/nova/nova/objects/block_device.py", line 242, in get_by_volume_and_instance
4513 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server raise exception.VolumeBDMNotFound(volume_id=volume_id)
4514 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server
4515 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server VolumeBDMNotFound: No volume Block Device Mapping with id 23933e67-a4c0-4de9-b0dc-3da37bce1b78.
4516 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server
4517 2017-02-01 07:35:04.931 TRACE oslo_messaging.rpc.server

Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: New → In Progress
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Matthew Booth (mbooth-9)
Changed in nova:
assignee: Matthew Booth (mbooth-9) → Lee Yarwood (lyarwood)

Reviewed: https://review.openstack.org/427364
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1bf7ac804808eb0d4a68847e466f5fa7d1dcd55e
Submitter: Jenkins
Branch: master

commit 1bf7ac804808eb0d4a68847e466f5fa7d1dcd55e
Author: Lee Yarwood <email address hidden>
Date: Tue Jan 31 18:39:15 2017 +0000

    libvirt: Remove redundant bdm serial mangling and saving during swap_volume

    During an initial swap_volume call the serial of the original volume was
    previously stashed in the connection_info of the new volume by the
    compute layer. This was used by I19d5182d11 to allow the virt driver to
    lookup and update the existing BDM with the new volume's connection_info
    after it had been used by connect_volume.

    Future calls to swap_volume in the compute layer would not
    update the serial found in the old volume connection_info. This would
    result in an invalid serial being copied into the new volume
    connection_info and used to preform a lookup of a BDM that didn't exist.

    To correct this we now explicitly set the serial of the new volume to
    that of the new volume id. While the correct serial id should already be
    present in the connection_info provided by most backend Cinder volume
    drivers the act of updating this dict is required by our own functional
    tests to invoke a failure case :

    https://git.io/vDmRE

    The serial is updated once more to match the volume id returned
    by os-migrate-volume-completion prior to the BDM being updated in the
    compute layer.

    The BDM lookup and save from the virt layer is also removed as the
    compute layer retains a reference to new_cinfo and will update the BDM
    with this, including any modifications, at the end of swap_volume.

    Finally, the associated Tempest admin test is also extended by the
    following change to now attempt a second volume swap to verify these
    changes :

    I2a2c80a164b9f75d0e7e0503a24194bedfc0e66b

    Closes-bug: #1661016
    Change-Id: If74dd9d8e7191047b6f1cd7e35b6fc667f004f91

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem) on 2017-02-10
tags: added: libvirt swap-volume volumes
Changed in nova:
importance: Undecided → Medium

Reviewed: https://review.openstack.org/431530
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4bf211c617814d46d590bb7b11806170f5c19484
Submitter: Jenkins
Branch: stable/ocata

commit 4bf211c617814d46d590bb7b11806170f5c19484
Author: Lee Yarwood <email address hidden>
Date: Tue Jan 31 18:39:15 2017 +0000

    libvirt: Remove redundant bdm serial mangling and saving during swap_volume

    During an initial swap_volume call the serial of the original volume was
    previously stashed in the connection_info of the new volume by the
    compute layer. This was used by I19d5182d11 to allow the virt driver to
    lookup and update the existing BDM with the new volume's connection_info
    after it had been used by connect_volume.

    Future calls to swap_volume in the compute layer would not
    update the serial found in the old volume connection_info. This would
    result in an invalid serial being copied into the new volume
    connection_info and used to preform a lookup of a BDM that didn't exist.

    To correct this we now explicitly set the serial of the new volume to
    that of the new volume id. While the correct serial id should already be
    present in the connection_info provided by most backend Cinder volume
    drivers the act of updating this dict is required by our own functional
    tests to invoke a failure case :

    https://git.io/vDmRE

    The serial is updated once more to match the volume id returned
    by os-migrate-volume-completion prior to the BDM being updated in the
    compute layer.

    The BDM lookup and save from the virt layer is also removed as the
    compute layer retains a reference to new_cinfo and will update the BDM
    with this, including any modifications, at the end of swap_volume.

    Finally, the associated Tempest admin test is also extended by the
    following change to now attempt a second volume swap to verify these
    changes :

    I2a2c80a164b9f75d0e7e0503a24194bedfc0e66b

    Closes-bug: #1661016
    Change-Id: If74dd9d8e7191047b6f1cd7e35b6fc667f004f91
    (cherry picked from commit 1bf7ac804808eb0d4a68847e466f5fa7d1dcd55e)

Reviewed: https://review.openstack.org/431540
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a714ffe89f1201c2b238284be3a4817b497b161b
Submitter: Jenkins
Branch: stable/newton

commit a714ffe89f1201c2b238284be3a4817b497b161b
Author: Lee Yarwood <email address hidden>
Date: Tue Jan 31 18:39:15 2017 +0000

    libvirt: Remove redundant bdm serial mangling and saving during swap_volume

    During an initial swap_volume call the serial of the original volume was
    previously stashed in the connection_info of the new volume by the
    compute layer. This was used by I19d5182d11 to allow the virt driver to
    lookup and update the existing BDM with the new volume's connection_info
    after it had been used by connect_volume.

    Future calls to swap_volume in the compute layer would not
    update the serial found in the old volume connection_info. This would
    result in an invalid serial being copied into the new volume
    connection_info and used to preform a lookup of a BDM that didn't exist.

    To correct this we now explicitly set the serial of the new volume to
    that of the new volume id. While the correct serial id should already be
    present in the connection_info provided by most backend Cinder volume
    drivers the act of updating this dict is required by our own functional
    tests to invoke a failure case :

    https://git.io/vDmRE

    The serial is updated once more to match the volume id returned
    by os-migrate-volume-completion prior to the BDM being updated in the
    compute layer.

    The BDM lookup and save from the virt layer is also removed as the
    compute layer retains a reference to new_cinfo and will update the BDM
    with this, including any modifications, at the end of swap_volume.

    Finally, the associated Tempest admin test is also extended by the
    following change to now attempt a second volume swap to verify these
    changes :

    I2a2c80a164b9f75d0e7e0503a24194bedfc0e66b

    Conflicts:
            nova/compute/manager.py

    NOTE(lyarwood): Simple conflict caused by I90d4ffcb adding an error
    notification for swap_volume.

    Closes-bug: #1661016
    Change-Id: If74dd9d8e7191047b6f1cd7e35b6fc667f004f91
    (cherry picked from commit 1bf7ac804808eb0d4a68847e466f5fa7d1dcd55e)
    (cherry picked from commit 4bf211c617814d46d590bb7b11806170f5c19484)

This issue was fixed in the openstack/nova 15.0.1 release.

This issue was fixed in the openstack/nova 14.0.5 release.

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

This issue was fixed in the openstack/tempest 16.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers