OpenStack Compute (nova)

libvirt migrate/resize on shared storage can cause data loss

Bug #1177247 reported by Rafi Khardalian on 2013-05-07

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Dan Smith	OpenStack Compute (nova) 2013.2 "havana"
	Grizzly	Fix Released	High	Rafi Khardalian	OpenStack Compute (nova) 2013.1.3

Bug Description

When using shared storage across hypervisors, libvirt driver resize/migrate operations can result in a loss of instance data. This is happening because many of the operations to create a copy of the instance are done within a try/except block. Thus, if any operations fail, you're into the exception which does the following:

=== code ===

        except Exception:
            with excutils.save_and_reraise_exception():
                self._cleanup_remote_migration(dest, inst_base,
                                               inst_base_resize)

    def _cleanup_remote_migration(self, dest, inst_base, inst_base_resize):
        """Used only for cleanup in case migrate_disk_and_power_off fails."""
        try:
            if os.path.exists(inst_base_resize):
                utils.execute('rm', '-rf', inst_base)
                utils.execute('mv', inst_base_resize, inst_base)
                utils.execute('ssh', dest, 'rm', '-rf', inst_base)
        except Exception:
            pass

=== end ===

It doesn't take looking at this code for long to see why this is going to be a problem with shared storage. In effect, the last ssh operation in the block above is going to blow away the original copy of the instance directory.

The issue can be easily reproduced by issuing a resize of an instance with a large root disk. In the middle of the resize, kill the ssh process created from the following call (https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3508) and observe the exception handler destroying everything.

See original description

Revision history for this message

Rafi Khardalian (rkhardalian) wrote on 2013-05-07:

I've got a patch ready to be submitted.

description:	updated
Changed in nova:
assignee:	nobody → Rafi Khardalian (rkhardalian)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-05-07: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/28424

Changed in nova:
status:	New → In Progress

OpenStack Infra (hudson-openstack) on 2013-05-09

Changed in nova:
assignee:	Rafi Khardalian (rkhardalian) → Dan Smith (danms)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-05-10: Fix merged to nova (master)

Reviewed: https://review.openstack.org/28424
Committed: http://github.com/openstack/nova/commit/9290bddd9f270d8ea4fbd6d953a8634473979cd5
Submitter: Jenkins
Branch: master

commit 9290bddd9f270d8ea4fbd6d953a8634473979cd5
Author: Rafi Khardalian <email address hidden>
Date: Sun May 5 22:18:33 2013 +0000

Make resize/migrated shared storage aware

Fixes bug 1177247

    Added some logic to check for whether or not we are on a shared
    filesystem and set shared_storage accordingly. We perform similar
    checks in other areas of the code, typically through RPC calls.
    However, all the resize/migrate code is slated to be refactored for
    Hava, so the idea was to keep this patch as minimally intrusive as
    possible.

    When shared_storage is true, we pass that on to the cleanup call
    so that it no longer executes an rm via SSH, which was ultimately
    destroying the original instance directory.

Change-Id: Ie9decedd373c000211c171df64e1e96fe78e5081

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-05-29

Changed in nova:
milestone:	none → havana-1
status:	Fix Committed → Fix Released

Rafi Khardalian (rkhardalian) on 2013-06-12

tags:

added: grizzly-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-06-12: Fix proposed to nova (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/32768

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-07-29: Fix merged to nova (stable/grizzly)

Reviewed: https://review.openstack.org/32768
Committed: http://github.com/openstack/nova/commit/d34d4cacf7b20f72c67f7873dcf2c372abc60ecd
Submitter: Jenkins
Branch: stable/grizzly

commit d34d4cacf7b20f72c67f7873dcf2c372abc60ecd
Author: Rafi Khardalian <email address hidden>
Date: Sun May 5 22:18:33 2013 +0000

Make resize/migrated shared storage aware

Fixes bug 1177247 (for stable/grizzly)

    When shared_storage is true, we pass that on to the cleanup call
    so that it no longer executes an rm via SSH, which was ultimately
    destroying the original instance directory.

Change-Id: Ie9decedd373c000211c171df64e1e96fe78e5081
Cherry-Pick: 9290bddd9f270d8ea4fbd6d953a8634473979cd5

tags:

added: in-stable-grizzly

Alan Pevec (apevec) on 2013-08-06

tags:	removed: grizzly-backport-potential in-stable-grizzly
Changed in nova:
importance:	Undecided → High

Thierry Carrez (ttx) on 2013-10-17

Changed in nova:
milestone:	havana-1 → 2013.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.