OpenStack Compute (nova)

glusterfs: Instance is not using the correct volume snapshot file after reboot

Bug #1304695 reported by Thang Pham on 2014-04-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Thang Pham	OpenStack Compute (nova) 2014.2 "juno"
	Icehouse	Fix Released	Medium	Eric Harney	OpenStack Compute (nova) 2014.1.2

Bug Description

Instance is not using the correct volume snapshot file after reboot.

Steps to recreate bug:
1. Create a volume

2. Attach volume to a running instance.

3. Take an online snapshot of the volume.
Note that the active volume used by the instance is now switched to volume-<uuid>.<snapshot-uuid>.

4. Shutdown the instance.

5. Start the instance.
If you invoke virsh dumpxml <instance>, you will see that it is re-attaching the base volume ( volume-<uuid>) to the instance and not the snapshot volume (volume-<uuid>.<snapshot-uuid>). The expected behavior is to have the snapshot volume re-attach to the instance.

This bug will cause data corruption in the snapshot and volume.

It looks like the nova volume manager is using a stale copy of the block_device_mapping. The block_device_mapping needs to be refreshed in order for the updated volume snapshot to be used.

On power on, the nova manager (nova/compute/manager.py ) does:
1. start_instance
2. _power_on
3. _get_instance_volume_block_device_info

The structure for this method is:
def _get_instance_volume_block_device_info(self, context, instance, refresh_conn_info=False, bdms=None):
    if not bdms:
        bdms = (block_device_obj.BlockDeviceMappingList.
                    get_by_instance_uuid(context, instance['uuid']))
        block_device_mapping = (
            driver_block_device.convert_volumes(bdms) +
            driver_block_device.convert_snapshots(bdms) +
            driver_block_device.convert_images(bdms))
    ....
block_device_obj.BlockDeviceMappingList.get_by_instance_uuid() goes and queries the database to construct the bdms, which will contain stale data.

See original description

Tags:

Thang Pham (thang-pham) on 2014-04-08

Changed in cinder:
assignee:	nobody → Thang Pham (thang-pham)

Revision history for this message

Eric Harney (eharney) wrote on 2014-04-08:

This analysis does not sound quite right to me, but I could be missing something. connect_volume() uses data['name'] to determine the filename. This is populated with the info from the Cinder GlusterFS driver's initialize_connection(), which sets it correctly.

The responsibility for calculating which filename should be attached to the VM is purely on the Cinder side, Nova doesn't do that calculation.

Revision history for this message

Thang Pham (thang-pham) wrote on 2014-04-09:

You are correct. I was only looking at where the file name came from in nova, but not the actual root source of the file name, which is in initialize_connection.

Revision history for this message

Thang Pham (thang-pham) wrote on 2014-04-09:

I debugged this problem a bit more. It looks like the nova volume manager is using a stale copy of the block_device_mapping. The block_device_mapping needs to be refreshed in order for the updated volume snapshot to be used.

On power on, the nova manager (nova/compute/manager.py ) does:
1. start_instance
2. _power_on
3. _get_instance_volume_block_device_info

In order for this to work properly, refresh_conn_info must be set to True. However, since _get_instance_volume_block_device_info is used by everyone, I am not sure if setting refresh_conn_info = True is such a good solution, since it will affect other cinder drivers also.

Thang Pham (thang-pham) on 2014-04-09

description:

updated

Nikola Đipanov (ndipanov) on 2014-04-10

Changed in nova:
importance:	Undecided → Medium

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-04-10:

Looking at it closer, I'd say that we just need to refresh the connection info once we are done with the snapshot.

We should be careful to avoid any race conditions tho, but basically what we need to do in both volume_snapshot_create and _delete methods of the libvirt driver is:

  from nova.virt import block_device as driver_block_device
  bdm = block_device_obj.BlockDeviceMapping.get_by_volume_id(
                    context, volume_id)
  driver_bdm = driver_block_device.DriverVolumeBlockDevice(bdm)
  driver_bdm.refresh_connection_info(context, instance, self._volume_api, self)

and you should be good to go.

The only caveat here is that refresh_connection_info will call out to Cinder's initialize_connection which may be problematic since the snapshot is in "creating" state. But nothing that a dumb polling loop won't solve (sigh). In reality we should probably rething these interactions.

Revision history for this message

Thang Pham (thang-pham) wrote on 2014-04-10:

Nikola: Thank you for the pointer :) I will try out your advice above.

Changed in nova:
assignee:	nobody → Thang Pham (thang-pham)

Mike Perez (thingee) on 2014-04-11

tags:

added: drivers

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-04-15: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/87432

Changed in nova:
status:	New → In Progress

Thang Pham (thang-pham) on 2014-04-15

Changed in cinder:
status:	New → In Progress

Eric Harney (eharney) on 2014-04-21

no longer affects:	cinder
tags:	added: libvirt

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-17: Fix merged to nova (master)

Reviewed: https://review.openstack.org/87432
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=329b594436aad814e43740ea26841342a8772aff
Submitter: Jenkins
Branch: master

commit 329b594436aad814e43740ea26841342a8772aff
Author: Thang Pham <email address hidden>
Date: Mon Apr 14 22:47:06 2014 -0400

libvirt: Refresh volume connection_info after volume snapshot

    The following patch is related to the guest assisted snapshot
    functionality. When you take a snapshot of a volume
    (e.g. GlusterFS) attached to a running instance, a new snapshot
    file is created, i.e. volume-<uuid>.<snapshot-uuid>. The
    instance uses this file as the active volume. If you shutdown
    and restart the instance, nova will reattach the base volume
    (volume-<uuid>) to the instance instead of the snapshot volume
    (volume-<uuid>.<snapshot-uuid>). The expected behavior is to
    have the snapshot volume reattach to the instance. This is
    caused by stale data being returned from the database when
    _get_instance_volume_block_device_info is called during
    _power_on. To fix this bug, this patch calls
    refresh_connection_info to update the database in both
    volume_snapshot_create and _volume_snapshot_delete methods of the
    libvirt driver.

Change-Id: I0f340a3f879580e7981d97863bc299e33d71aa84
Closes-Bug: #1304695

Changed in nova:
status:	In Progress → Fix Committed

Vish Ishaya (vishvananda) on 2014-06-02

tags:

added: icehouse-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-06-05: Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/98255

Thierry Carrez (ttx) on 2014-06-11

Changed in nova:
milestone:	none → juno-1
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-31: Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/98255
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=96212b194d390c4de9298376aefe84e3a70008de
Submitter: Jenkins
Branch: stable/icehouse

commit 96212b194d390c4de9298376aefe84e3a70008de
Author: Thang Pham <email address hidden>
Date: Mon Apr 14 22:47:06 2014 -0400

libvirt: Refresh volume connection_info after volume snapshot

Conflicts:
nova/tests/virt/libvirt/test_libvirt.py

Change-Id: I0f340a3f879580e7981d97863bc299e33d71aa84
Closes-Bug: #1304695

tags:

added: in-stable-icehouse

Chuck Short (zulcss) on 2014-08-07

tags:

removed: icehouse-backport-potential

Alan Pevec (apevec) on 2014-09-23

tags:

removed: in-stable-icehouse

Thierry Carrez (ttx) on 2014-10-16

Changed in nova:
milestone:	juno-1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.