Comment 5 for bug 1414065

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/151664
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7dd6a4a19311136c02d89cd2afd97236b0f4cc27
Submitter: Jenkins
Branch: master

commit 7dd6a4a19311136c02d89cd2afd97236b0f4cc27
Author: Daniel P. Berrange <email address hidden>
Date: Thu Jan 29 14:33:32 2015 +0000

    libvirt: proper monitoring of live migration progress

    The current live migration code simply invokes migrateToURI
    and waits for it to finish, or raise an exception. It considers
    all exceptions to mean the live migration aborted and the VM is
    still running on the source host. This is totally bogus, as there
    are a number of reasons why an error could be raised from the
    migrateToURI call. There are at least 5 different scenarios for
    what the VM might be doing on source + dest host upon error.
    The migration might even still be going on, even if after the
    error has occurred.

    A more reliable way to deal with this is to actively query
    libvirt for the domain job status. This gives an indication
    of whether the job is completed, failed or cancelled. Even
    with that though, there is a need for a few heuristics to
    distinguish some of the possible error scenarios.

    This change to do active monitoring of the live migration process
    also opens the door for being able to tune live migration on the
    fly to adjust max downtime or bandwidth to improve chances of
    getting convergence, or to automatically abort it after too much
    time has elapsed instead of letting it carry on until the end of
    the universe. This change merely records memory transfer progress
    and leaves tuning improvements to a later date.

    Closes-bug: #1414065
    Change-Id: I6fcbfa31a79c7808c861bb3a84b56bd096882004