Live migration breaks VM on NUMA enabled systems with shared storage

Bug #2080436 reported by Matthew Heler
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned

Bug Description

The commit c1ccc1a3165ec1556c605b3b036274e992b0a09d introduced
a regression when NUMA live migration was done on shared storage

            power_management_possible = (
                'dst_numa_info' in migrate_data and
                migrate_data.dst_numa_info is not None)
            # No instance booting at source host, but instance dir
            # must be deleted for preparing next block migration
            # must be deleted for preparing next live migration w/o shared
            # storage
            # vpmem must be cleaned
            do_cleanup = (not migrate_data.is_shared_instance_path or
                          has_vpmem or has_mdevs or power_management_possible)

Based on the commit, if any type of NUMA system is used with shared storage. Live migration will delete the backing folder for the VM, making the VM unusable for future operations.

My team is experiencing this issue on 2024.1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/928970

Changed in nova:
status: New → In Progress
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I have doubt that the root cause of the observed issue is the power mgmt change itself. I think we need to dig deeper.

This[1] is the relevant code (a bit more context what you pasted in the report)

            power_management_possible = (
                'dst_numa_info' in migrate_data and
                migrate_data.dst_numa_info is not None)
            # No instance booting at source host, but instance dir
            # must be deleted for preparing next block migration
            # must be deleted for preparing next live migration w/o shared
            # storage
            # vpmem must be cleaned
            do_cleanup = (not migrate_data.is_shared_instance_path or
                          has_vpmem or has_mdevs or power_management_possible)
            destroy_disks = not migrate_data.is_shared_block_storage

Note that the destroy_disk flag is not effected by the power_management_possible flag. So just because that flag is set I would not expect the deletion of disk on shared storage.

What the flag affects is if nova calls driver.cleanup or not after the live migration[2], but nova also passes the destroy_disks flag to that cleanup call. So after the power mgmt flag was added nova started calling driver.cleanup for all numa instances but for the ones on shared storage should have the destroy_disk false and that expected to prevent the disk deletion.

The libvirt driver cleanup call with destroy_disks false has some extra logic and worrying comments[3].
It feels to me that there is some contradiction between [4] and [5].

[1]https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9198C1-L9209C1
[2]https://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/compute/manager.py#L9411-L9420
[3]https://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/virt/libvirt/driver.py#L1707-L1724
[4] https://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/compute/manager.py#L9208
[5]https://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/virt/libvirt/driver.py#L1713

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

To summarize the discussion :
 - previously cleanup wasn't called with shared storage
 - now with the change in question, we do call cleanup
 - cleanup is destroying the disks if neither the instance is volume-backed nor on shared RBD image backend

That's why we now call cleanup and destroy the disks with shared storage, which indeed is a regression.

A simple proposal I see would be to modify the clause for destroying the disks which would also ensure that the instance is not on shared storage.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote (last edit ):

@Matthew could you try the alternative fix in your env I commented in https://review.opendev.org/c/openstack/nova/+/928970

The original fix proposed would regress the power mgmt live migration feature as we do need to call cleanup if we have power mgmt even if we should not destroy the disk.

Revision history for this message
Matthew Heler (mheler) wrote :

@Balazs, I tried the proposed changes and the backing instance directory was still removed during live migration.

Revision history for this message
Matthew Heler (mheler) wrote :

In this scenario, we have NUMA pinned VMs and an NFS mount for the VMs, but we also have some VMs with cinder boot volumes. Meaning the logic in the code is assuming anything with NUMA needs to be cleaned up. Is this correct? Seems we may need a better way to determine if pm is used. The issue is that the instance directory is cleaned up, not just the backing disk.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/928970
Committed: https://opendev.org/openstack/nova/commit/035b8404fce878b0a88c4741bea46135b6af51e8
Submitter: "Zuul (22348)"
Branch: master

commit 035b8404fce878b0a88c4741bea46135b6af51e8
Author: Matthew N Heler <email address hidden>
Date: Wed Sep 11 12:28:15 2024 -0500

    Fix regression with live migration on shared storage

    The commit c1ccc1a3165ec1556c605b3b036274e992b0a09d introduced
    a regression when NUMA live migration was done on shared storage

    The live migration support for the power mgmt feature means we need to
    call driver.cleanup() for all NUMA instances to potentially offline
    pcpus that are not used any more after the instance is migrated away.
    However this change exposed an issue with the disk cleanup logic. Nova
    should never delete the instance directory if that directory is on
    shared storage (e.g. the nova instances path is backed by NFS).

    This patch will fix that behavior so live migration will function

    Closes-Bug: #2080436
    Change-Id: Ia2bbb5b4ac728563a8aabd857ed0503449991df1

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/nova/+/929903

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/nova/+/929968

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/nova/+/929969

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/nova/+/929903
Committed: https://opendev.org/openstack/nova/commit/57e037b507e5c31c9057fe463aec9cadaf2e6c63
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit 57e037b507e5c31c9057fe463aec9cadaf2e6c63
Author: Matthew N Heler <email address hidden>
Date: Wed Sep 11 12:28:15 2024 -0500

    Fix regression with live migration on shared storage

    The commit c1ccc1a3165ec1556c605b3b036274e992b0a09d introduced
    a regression when NUMA live migration was done on shared storage

    The live migration support for the power mgmt feature means we need to
    call driver.cleanup() for all NUMA instances to potentially offline
    pcpus that are not used any more after the instance is migrated away.
    However this change exposed an issue with the disk cleanup logic. Nova
    should never delete the instance directory if that directory is on
    shared storage (e.g. the nova instances path is backed by NFS).

    This patch will fix that behavior so live migration will function

    Closes-Bug: #2080436
    Change-Id: Ia2bbb5b4ac728563a8aabd857ed0503449991df1
    (cherry picked from commit 035b8404fce878b0a88c4741bea46135b6af51e8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 30.0.0.0rc1

This issue was fixed in the openstack/nova 30.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/nova/+/929968
Committed: https://opendev.org/openstack/nova/commit/0e4406a33e6907d3b335455a3f3c3ff80045ea0f
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 0e4406a33e6907d3b335455a3f3c3ff80045ea0f
Author: Matthew N Heler <email address hidden>
Date: Wed Sep 11 12:28:15 2024 -0500

    Fix regression with live migration on shared storage

    The commit c1ccc1a3165ec1556c605b3b036274e992b0a09d introduced
    a regression when NUMA live migration was done on shared storage

    The live migration support for the power mgmt feature means we need to
    call driver.cleanup() for all NUMA instances to potentially offline
    pcpus that are not used any more after the instance is migrated away.
    However this change exposed an issue with the disk cleanup logic. Nova
    should never delete the instance directory if that directory is on
    shared storage (e.g. the nova instances path is backed by NFS).

    This patch will fix that behavior so live migration will function

    2023.x only: Conflict resolved in compute/manager.py due to
    bluperint libvirt-mdev-live-migrate is not in 2023.x and the related
    unit test is removed.

    Closes-Bug: #2080436
    Change-Id: Ia2bbb5b4ac728563a8aabd857ed0503449991df1
    (cherry picked from commit 035b8404fce878b0a88c4741bea46135b6af51e8)
    (cherry picked from commit 57e037b507e5c31c9057fe463aec9cadaf2e6c63)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/nova/+/929969
Committed: https://opendev.org/openstack/nova/commit/e8999f0264a20bac56108fbc56e4a8a9fac61ae3
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit e8999f0264a20bac56108fbc56e4a8a9fac61ae3
Author: Matthew N Heler <email address hidden>
Date: Wed Sep 11 12:28:15 2024 -0500

    Fix regression with live migration on shared storage

    The commit c1ccc1a3165ec1556c605b3b036274e992b0a09d introduced
    a regression when NUMA live migration was done on shared storage

    The live migration support for the power mgmt feature means we need to
    call driver.cleanup() for all NUMA instances to potentially offline
    pcpus that are not used any more after the instance is migrated away.
    However this change exposed an issue with the disk cleanup logic. Nova
    should never delete the instance directory if that directory is on
    shared storage (e.g. the nova instances path is backed by NFS).

    This patch will fix that behavior so live migration will function

    2023.x only: Conflict resolved in compute/manager.py due to
    bluperint libvirt-mdev-live-migrate is not in 2023.x and the related
    unit test is removed.

    Closes-Bug: #2080436
    Change-Id: Ia2bbb5b4ac728563a8aabd857ed0503449991df1
    (cherry picked from commit 035b8404fce878b0a88c4741bea46135b6af51e8)
    (cherry picked from commit 57e037b507e5c31c9057fe463aec9cadaf2e6c63)
    (cherry picked from commit 0e4406a33e6907d3b335455a3f3c3ff80045ea0f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.5.1

This issue was fixed in the openstack/nova 27.5.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.