Comment 2 for bug 1905944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/764435
Committed: https://opendev.org/openstack/nova/commit/39f0af5d18d6bea34fa15b8f7778115b25432749
Submitter: "Zuul (22348)"
Branch: master

commit 39f0af5d18d6bea34fa15b8f7778115b25432749
Author: Alexandre Arents <email address hidden>
Date: Thu Nov 26 15:24:19 2020 +0000

    libvirt: Abort live-migration job when monitoring fails

    During live migration process, a _live_migration_monitor thread
    checks progress of migration on source host, if for any reason
    we hit infrastructure issue involving a DB/RPC/libvirt-timeout
    failure, an Exception is raised to the nova-compute service and
    instance/migration is set to ERROR state.

    The issue is that we may let live-migration job running out of nova
    control. At the end of job, guest is resumed on target host while
    nova still reports it on source host, this may lead to a split-brain
    situation if instance is restarted.

    This change proposes to abort live-migration job if issue occurs
    during _live_migration_monitor.

    Change-Id: Ia593b500425c81e54eb401e38264db5cc5fc1f93
    Closes-Bug: #1905944