revert resize removes rbd shared image

Bug #1314526 reported by Oleksiy
This bug report is a duplicate of:  Bug #1399244: rbd resize revert fails. Edit Remove
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Nicolas Simonds

Bug Description

We run multi-host nova-compute with

libvirt_images_type=rbd
libvirt_images_rbd_pool=compute

Resize-confirm function works just fine.
Resize-revert removes shared rbd for both instances image during reverting.

Options nova.conf i've tried to change with no luck :

allow_resize_to_same_host=True/False
resize_fs_using_block_device=True/False
block_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_NON_SHARED_INC
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER

Errors you can find at the bottom of the page.
1. first error was fixed by adding image_cache_manager_interval = 0
2. 2nd error still active.

During revert process for both types of migration there is driver.destroy() at destination that removes original image from rbd storage.

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3164
_____________
def revert_resize(self, context, instance, migration, reservations):
           ...
           self.driver.destroy(context, instance, network_info,
                               block_device_info)

           ...
_____________
that calls

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L956
_____________
def destroy(self, context, instance, network_info, block_device_info=None,
             destroy_disks=True):
     self._destroy(instance)
     self.cleanup(context, instance, network_info, block_device_info,
                  destroy_disks)
_____________

that calls

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1069
_____________
def cleanup(self, context, instance, network_info, block_device_info=None,
                destroy_disks=True):
    ....
        if destroy_disks:
            self._delete_instance_files(instance)
            self._cleanup_lvm(instance)
            #NOTE(haomai): destroy volumes if needed
            if CONF.libvirt.images_type == 'rbd':
                self._cleanup_rbd(instance)
    ....
_____________

revert_resize runs destroy function without destory_disk variable which makes cleanup function to delete SHARED image.

Here is approximate solution (not a developer)

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3199

change from :
_____________
self.driver.destroy(context, instance, network_info,
                                block_device_info)
_____________
to:
_____________
destroy_disks = not (self._is_instance_storage_shared(context, instance))
             self.driver.destroy(instance, network_info,
                                block_device_info)
                                block_device_info, destroy_disks=destroy_disks)
_____________

ERROR1####################################################
<179>Apr 28 14:14:00 [compute] node-39 <U+FEFF>nova-nova.virt.libvirt.imagebackend ERROR: error opening rbd image /var/lib/
nova/instances/_base/103bc0322b21e499ecea1c360abc6843ab829d06
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 467, in __init__
    read_only=read_only)
  File "/usr/lib/python2.7/dist-packages/rbd.py", line 351, in __init__
    raise make_ex(ret, 'error opening image %s at snapshot %s' % (name, snapshot))
ImageNotFound: error opening image /var/lib/nova/instances/_base/103bc0322b21e499ecea1c360abc6843ab829d06 at snapshot None
<179>Apr 28 14:14:00 [compute] node-39 <U+FEFF>nova-nova.compute.manager ERROR: Setting instance vm_state to ERROR
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3160, in finish_resize
    disk_info, image)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3128, in _finish_resize
    block_device_info, power_on)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4627, in finish_migration
    block_device_info=None, inject_files=False)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2395, in _create_image
    project_id=instance['project_id'])
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 177, in cache
    *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 638, in create_image
    self.verify_base_size(base, size)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 218, in verify_base_size
    base_size = self.get_disk_size(base)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 622, in get_disk_size
    with RBDVolumeProxy(self, name) as vol:
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/imagebackend.py", line 467, in __init__
    read_only=read_only)
  File "/usr/lib/python2.7/dist-packages/rbd.py", line 351, in __init__
    raise make_ex(ret, 'error opening image %s at snapshot %s' % (name, snapshot))
ERROR2####################################################

<179>Apr 29 08:34:13 [compute] node-39 <U+FEFF>nova-nova.virt.libvirt.driver ERROR: An error occurred while trying to launc
h a defined domain with xml ...
<179>Apr 29 08:34:13 [compute] node-39 <U+FEFF>nova-nova.compute.manager ERROR: Setting instance vm_state to ERROR
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4997, in _error_out_instance_on_exception
    yield
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2841, in finish_revert_resize
    block_device_info, power_on)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4673, in finish_revert_migration
    block_device_info, power_on)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3255, in _create_domain_and_network
    domain = self._create_domain(xml, instance=instance, power_on=power_on)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3198, in _create_domain
    domain.XMLDesc(0))
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3193, in _create_domain
    domain.createWithFlags(launch_flags)
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 179, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 139, in proxy_call
    rv = execute(f,*args,**kwargs)
  File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 77, in tworker
    rv = meth(*args,**kwargs)
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 711, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: internal error process exited while connecting to monitor: char device redirected to /dev/pts/4
kvm: -drive file=rbd:.... hiden... No such file or directory
####################################################

Oleksiy (lev4ykaol)
description: updated
Oleksiy (lev4ykaol)
description: updated
Tracy Jones (tjones-i)
tags: added: libvirt
Solly Ross (sross-7)
Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

This is related to:
https://bugs.launchpad.net/nova/+bug/1250751

A fix that should address both problems is proposed here:
https://review.openstack.org/91722

Changed in nova:
assignee: nobody → Dmitry Borodaenko (dborodaenko)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/91722
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bc45c56f102cdef58840e02b609a89f5278e8cce
Submitter: Jenkins
Branch: master

commit bc45c56f102cdef58840e02b609a89f5278e8cce
Author: Dmitry Borodaenko <email address hidden>
Date: Thu Nov 21 16:05:19 2013 -0800

    Improve shared storage checks for live migration

    Due to an assumption that libvirt live migrations work only when both
    instance path and disk data is shared between source and destination
    hosts (e.g. libvirt instances directory is on NFS), instance disks are
    removed from shared storage when instance path is not shared (e.g. Ceph
    RBD backend is enabled).

    Distinguish cases that require shared instance drive and shared libvirt
    instance directory. Reflect the fact that RBD backed instances have
    shared instance drive (and no shared libvirt instance directory) in the
    relevant conditionals.

    UpgradeImpact: Live migrations from or to a compute host running a
    version of Nova pre-dating this commit are disabled in order to
    eliminate possibility of data loss. Upgrade Nova on both the source and
    the target node before attempting a live migration.

    Closes-bug: 1250751
    Closes-bug: 1314526
    Co-authored-by: Ryan Moe <email address hidden>
    Co-authored-by: Yaguang Tang <email address hidden>
    Signed-off-by: Dmitry Borodaenko <email address hidden>
    Change-Id: I2755c59b4db736151000dae351fd776d3c15ca39

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Revision history for this message
Solly Ross (sross-7) wrote :

The merged patch only fixes live migration -- nothing has changed for cold migrations/resizes (it still just calls resize) -- I ran into this with my work for bp use-libvirt-storage-pools.

Changed in nova:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/124161

Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → none
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Solly, please confirm if https://review.openstack.org/121745 has not resolved the rest of the resize problem.

Changed in nova:
assignee: Dmitry Borodaenko (dborodaenko) → Solly Ross (sross-7)
status: Confirmed → Incomplete
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

I found bug #1399244 that was opened for what seems to be the same problem, reassigning this bug to Jon Bernard who's got a fix out for review at https://review.openstack.org/139693.

Changed in nova:
assignee: Solly Ross (sross-7) → Jon Bernard (jbernard)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/icehouse)

Change abandoned by Sean Dague (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/124161
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Removing "In Progress" status and assignee as change is abandoned.

Changed in nova:
status: In Progress → Confirmed
assignee: Jon Bernard (jbernard) → nobody
Revision history for this message
Daniel Speichert (dasp) wrote :

This is still a major problem for regular resize (while cancelling) on latest Juno stable release.

Changed in nova:
assignee: nobody → Dan Smith (danms)
status: Confirmed → In Progress
Changed in nova:
assignee: Dan Smith (danms) → Nicolas Simonds (nicolas.simonds)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/228505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/liberty)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/228505
Reason: Abandoning since the master change hasn't been approved, please restore and re-propose once that happens if you still need this for stable/liberty.

Matt Riedemann (mriedem)
tags: added: ceph
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/187395
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=29476a67d4d4ff7eedd97f589422d47015111448
Submitter: Jenkins
Branch: master

commit 29476a67d4d4ff7eedd97f589422d47015111448
Author: Nicolas Simonds <email address hidden>
Date: Mon Jun 1 13:58:37 2015 -0700

    libvirt: Fix/implement revert-resize for RBD-backed images

    * Makes a snapshot of Ceph-backed roots prior to resize
    * Rolls back to snapshot on revert
    * Destroys resize snapshots on image cleanup

    Closes-Bug: 1369465
    Closes-Bug: 1314526
    Change-Id: I328d2c41696a9c0f090f822a51ea42fac83f62ec

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/228505
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eb1f67c4374fd8210829deac8e49d6f3df6cae4b
Submitter: Jenkins
Branch: stable/liberty

commit eb1f67c4374fd8210829deac8e49d6f3df6cae4b
Author: Nicolas Simonds <email address hidden>
Date: Mon Jun 1 13:58:37 2015 -0700

    libvirt: Fix/implement revert-resize for RBD-backed images

    * Makes a snapshot of Ceph-backed roots prior to resize
    * Rolls back to snapshot on revert
    * Destroys resize snapshots on image cleanup

    Conflicts:
        nova/tests/unit/virt/libvirt/test_driver.py

    because (I70215fb25ef25422786b96d33c91d8f1d4760a23) isn't on liberty

    (cherry picked from commit 29476a67d4d4ff7eedd97f589422d47015111448)

    Closes-Bug: 1369465
    Closes-Bug: 1314526
    Change-Id: I328d2c41696a9c0f090f822a51ea42fac83f62ec

tags: added: in-stable-liberty
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 13.0.0.0b3

This issue was fixed in the openstack/nova 13.0.0.0b3 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 12.0.2

This issue was fixed in the openstack/nova 12.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/294599

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/kilo)

Change abandoned by Liang Chen (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/294599

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.