Nova recreates instance directory after migration/resize

Bug #1666831 reported by Maciej Jozefczyk
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Maciej Jozefczyk
Ocata
Fix Committed
Low
Lee Yarwood

Bug Description

Description
===========
Nova recreates instance directory on source host after successful migration/resize when using QEMU Qcow2 file drives.

Nova after migration executes method driver.confirm_migration().
This method cleans instance directory (instance directory with suffix _resize):

nova/virt/libvirt/driver.py
1115 if os.path.exists(target):
1116 # Deletion can fail over NFS, so retry the deletion as required.
1117 # Set maximum attempt as 5, most test can remove the directory
1118 # for the second time.
1119 utils.execute('rm', '-rf', target, delay_on_retry=True,
1120 attempts=5)

After that Nova executes:
1122 root_disk = self.image_backend.by_name(instance, 'disk')

root_disk is used to remove rdb snapshots, but during execution of self.image_backend.by_name() nova recreates instance directory.

Flow:

driver.confirm_migration()->self._cleanup_resize()->self.image_backend.by_name() -> (nova/virt/libvirt/imagebackend.py) image_backend.by_name()->Qcow2.__init__()->Qcow2.resolve_driver_format().

Qcow2.resolve_driver_format():
 344 if self.disk_info_path is not None:
 345 fileutils.ensure_tree(os.path.dirname(self.disk_info_path))
 346 write_to_disk_info_file()

Steps to reproduce
==================

- spawn instance
- migrate/resize instance
- check that instance dir on old host still exists (example: /home/instances/<instance_uuid>/disk.info

Expected result
===============
After migration directory /home/instances/<instance_uuid> and file /home/instances/<instance_uuid> should not exist.

Actual result
=============
Nova leaves instance directory after migration/resize.

Environment
===========
1. Openstack Newton (it seems master is affected too).

2. Libvirt + KVM

3. Qcow2 file images on local disk.

Changed in nova:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/437356

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/437356
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6347baf3d09036525b7f6df991ae440d558f9cc3
Submitter: Jenkins
Branch: master

commit 6347baf3d09036525b7f6df991ae440d558f9cc3
Author: Maciej Józefczyk <email address hidden>
Date: Thu Feb 23 12:56:04 2017 +0100

    Ensure that instance directory is removed after success migration/resize

    Nova recreates instance directory on source host after successful migration/resize.
    This patch removes directory of migrated instance from source host.

    Change-Id: Ic683f83e428106df64be42287e2c5f3b40e73da4
    Closes-Bug: #1666831

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/441037

Matt Riedemann (mriedem)
tags: added: libvirt migrate
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/441037
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=97752d9c54db13a1317620270c2c443e5702cc89
Submitter: Jenkins
Branch: stable/ocata

commit 97752d9c54db13a1317620270c2c443e5702cc89
Author: Maciej Józefczyk <email address hidden>
Date: Thu Feb 23 12:56:04 2017 +0100

    Ensure that instance directory is removed after success migration/resize

    Nova recreates instance directory on source host after successful migration/resize.
    This patch removes directory of migrated instance from source host.

    Change-Id: Ic683f83e428106df64be42287e2c5f3b40e73da4
    Closes-Bug: #1666831
    (cherry picked from commit 6347baf3d09036525b7f6df991ae440d558f9cc3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.1

This issue was fixed in the openstack/nova 15.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0b1

This issue was fixed in the openstack/nova 16.0.0.0b1 development milestone.

Revision history for this message
Oisin (oisin-omalley) wrote :

We're experiencing this issue with Nova 14.0.2 in Newton. As the fix appears small, could we see it back ported to Newton?

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Newton [1] is currently under phase II support [2] so a low priority bugfix like this isn't appropriate for stable/newton unfortunatley.

[1] https://releases.openstack.org/
[2] https://docs.openstack.org/project-team-guide/stable-branches.html#support-phases

Revision history for this message
Jay Pipes (jaypipes) wrote :

Turns out that the fix for this bug accidentally broke boot-from-volume setups that use NFS. In particular, this line:

https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1149

 if os.path.exists(inst_base) and not root_disk.exists():

needs to be changed to this in order to prevent BFV instances from being destroyed on resize...

 if os.path.exists(inst_base) and not root_disk.exists() and not compute_utils.is_volume_backed_instance(instance._context, instance):

I'll open a new bug for this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/566367

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/567623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/567625

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/567630

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/566367
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e3385707cb1ced55cd12b1314d8c0b68d354c38
Submitter: Zuul
Branch: master

commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38
Author: Matt Riedemann <email address hidden>
Date: Fri May 4 12:58:07 2018 -0400

    libvirt: check image type before removing snapshots in _cleanup_resize

    Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk
    cleanup logic to _cleanup_resize because some image backends (Qcow2,
    Flat and Ploop) will re-create the instance directory and disk.info
    file when initializing the image backend object.

    However, that change did not take into account volume-backed instances
    being resized will not have a root disk *and* if the local disk is
    shared storage, removing the instance directory effectively deletes
    the instance files, like the console.log, on the destination host
    as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made
    to resolve that issue.

    However (see the pattern?), if you're doing a resize of a
    volume-backed instance that is not on shared storage, we won't remove
    the instance directory from the source host in _cleanup_resize. If the
    admin then later tries to live migrate the instance back to that host,
    it will fail with DestinationDiskExists in the pre_live_migration()
    method.

    This change is essentially a revert of
    I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for
    Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem
    is that creating certain imagebackend objects will recreate the
    instance directory and disk.info on the source host, we simply need
    to avoid creating the imagebackend object. The only reason we are
    getting an imagebackend object in _cleanup_resize is to remove
    image snapshot clones, which is only implemented by the Rbd image
    backend. Therefore, we can check to see if the image type supports
    clones and if not, don't go through the imagebackend init routine
    that, for some, will recreate the disk.

    Change-Id: Ib10081150e125961cba19cfa821bddfac4614408
    Closes-Bug: #1769131
    Related-Bug: #1666831
    Related-Bug: #1728603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/567623
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=174764340d3c965d31143b39af4ab2e8ecefe594
Submitter: Zuul
Branch: stable/queens

commit 174764340d3c965d31143b39af4ab2e8ecefe594
Author: Matt Riedemann <email address hidden>
Date: Fri May 4 12:58:07 2018 -0400

    libvirt: check image type before removing snapshots in _cleanup_resize

    Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk
    cleanup logic to _cleanup_resize because some image backends (Qcow2,
    Flat and Ploop) will re-create the instance directory and disk.info
    file when initializing the image backend object.

    However, that change did not take into account volume-backed instances
    being resized will not have a root disk *and* if the local disk is
    shared storage, removing the instance directory effectively deletes
    the instance files, like the console.log, on the destination host
    as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made
    to resolve that issue.

    However (see the pattern?), if you're doing a resize of a
    volume-backed instance that is not on shared storage, we won't remove
    the instance directory from the source host in _cleanup_resize. If the
    admin then later tries to live migrate the instance back to that host,
    it will fail with DestinationDiskExists in the pre_live_migration()
    method.

    This change is essentially a revert of
    I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for
    Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem
    is that creating certain imagebackend objects will recreate the
    instance directory and disk.info on the source host, we simply need
    to avoid creating the imagebackend object. The only reason we are
    getting an imagebackend object in _cleanup_resize is to remove
    image snapshot clones, which is only implemented by the Rbd image
    backend. Therefore, we can check to see if the image type supports
    clones and if not, don't go through the imagebackend init routine
    that, for some, will recreate the disk.

    Conflicts:
          nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(mriedem): The conflict is due to not having change
    Icdd039bb4374269d9da38e7f8d2e15e05ca8aadb in Queens.

    Change-Id: Ib10081150e125961cba19cfa821bddfac4614408
    Closes-Bug: #1769131
    Related-Bug: #1666831
    Related-Bug: #1728603
    (cherry picked from commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/567625
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c72a0a7665e96219f0301525edc513dda07b320b
Submitter: Zuul
Branch: stable/pike

commit c72a0a7665e96219f0301525edc513dda07b320b
Author: Matt Riedemann <email address hidden>
Date: Fri May 4 12:58:07 2018 -0400

    libvirt: check image type before removing snapshots in _cleanup_resize

    Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk
    cleanup logic to _cleanup_resize because some image backends (Qcow2,
    Flat and Ploop) will re-create the instance directory and disk.info
    file when initializing the image backend object.

    However, that change did not take into account volume-backed instances
    being resized will not have a root disk *and* if the local disk is
    shared storage, removing the instance directory effectively deletes
    the instance files, like the console.log, on the destination host
    as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made
    to resolve that issue.

    However (see the pattern?), if you're doing a resize of a
    volume-backed instance that is not on shared storage, we won't remove
    the instance directory from the source host in _cleanup_resize. If the
    admin then later tries to live migrate the instance back to that host,
    it will fail with DestinationDiskExists in the pre_live_migration()
    method.

    This change is essentially a revert of
    I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for
    Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem
    is that creating certain imagebackend objects will recreate the
    instance directory and disk.info on the source host, we simply need
    to avoid creating the imagebackend object. The only reason we are
    getting an imagebackend object in _cleanup_resize is to remove
    image snapshot clones, which is only implemented by the Rbd image
    backend. Therefore, we can check to see if the image type supports
    clones and if not, don't go through the imagebackend init routine
    that, for some, will recreate the disk.

    Change-Id: Ib10081150e125961cba19cfa821bddfac4614408
    Closes-Bug: #1769131
    Related-Bug: #1666831
    Related-Bug: #1728603
    (cherry picked from commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38)
    (cherry picked from commit 174764340d3c965d31143b39af4ab2e8ecefe594)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/567630
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a16fa14ce47bd2d3a5189047e9bd330a607ef3cc
Submitter: Zuul
Branch: stable/ocata

commit a16fa14ce47bd2d3a5189047e9bd330a607ef3cc
Author: Matt Riedemann <email address hidden>
Date: Fri May 4 12:58:07 2018 -0400

    libvirt: check image type before removing snapshots in _cleanup_resize

    Change Ic683f83e428106df64be42287e2c5f3b40e73da4 added some disk
    cleanup logic to _cleanup_resize because some image backends (Qcow2,
    Flat and Ploop) will re-create the instance directory and disk.info
    file when initializing the image backend object.

    However, that change did not take into account volume-backed instances
    being resized will not have a root disk *and* if the local disk is
    shared storage, removing the instance directory effectively deletes
    the instance files, like the console.log, on the destination host
    as well. Change I29fac80d08baf64bf69e54cf673e55123174de2a was made
    to resolve that issue.

    However (see the pattern?), if you're doing a resize of a
    volume-backed instance that is not on shared storage, we won't remove
    the instance directory from the source host in _cleanup_resize. If the
    admin then later tries to live migrate the instance back to that host,
    it will fail with DestinationDiskExists in the pre_live_migration()
    method.

    This change is essentially a revert of
    I29fac80d08baf64bf69e54cf673e55123174de2a and alternate fix for
    Ic683f83e428106df64be42287e2c5f3b40e73da4. Since the root problem
    is that creating certain imagebackend objects will recreate the
    instance directory and disk.info on the source host, we simply need
    to avoid creating the imagebackend object. The only reason we are
    getting an imagebackend object in _cleanup_resize is to remove
    image snapshot clones, which is only implemented by the Rbd image
    backend. Therefore, we can check to see if the image type supports
    clones and if not, don't go through the imagebackend init routine
    that, for some, will recreate the disk.

    Change-Id: Ib10081150e125961cba19cfa821bddfac4614408
    Closes-Bug: #1769131
    Related-Bug: #1666831
    Related-Bug: #1728603
    (cherry picked from commit 8e3385707cb1ced55cd12b1314d8c0b68d354c38)
    (cherry picked from commit 174764340d3c965d31143b39af4ab2e8ecefe594)
    (cherry picked from commit c72a0a7665e96219f0301525edc513dda07b320b)

tags: added: in-stable-ocata
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.