live migration does not clean up at target node if a failure occurs during post migration

Bug #1628606 reported by Paul Carlton
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Low
Artom Lifshitz
Wallaby
New
Undecided
Unassigned
Xena
New
Undecided
Unassigned
Yoga
New
Undecided
Unassigned
Zed
New
Undecided
Unassigned

Bug Description

If a live migration fails during the post processing on the source (i.e. failure to disconnect volumes) it can lead to the instance being shutdown on the source node and left in a migrating task state. Also the copy of the instance on the target node will be left running although not usable because neutron networking has not yet been switch to target and nova stills records the instance as being on the source node.

This situation can be resolved as follows:

on target
virsh destroy <instance domain id>
if the compute nodes are NOT using shared storage
sudo rm -rf <instance uuid directory>

Then use nova client as admin to restart the instance on the source node:
nova reset-state --active <instance uuid>
nova reboot --hard <instance uuid>

I will investigate how to address this issue

Changed in nova:
assignee: nobody → Paul Carlton (paul-carlton2)
Changed in nova:
importance: Undecided → Low
tags: added: live-migration
Revision history for this message
Paul Carlton (paul-carlton2) wrote :

Thinking about this it is not that simple. Once the instance has been started on the target it could do work that would be lost if we destroy it and resurrect the instance on the source. As we found out when Matt Booth was fixing the post copy network bug with certain neutron providers the instance at the target becomes accessible to the network immediately it starts up (due to arp'ing) so effectively once libvirt has un-paused the instance on the target and destroyed the instance on the the source we are effective beyond the point of no return.

Trouble is the instance host does not get updated until the end of the post migration processing so it still looks like it is on the source in a migrating state. If any step in post migration give rise to an exception it skips the rest of the post migration and updates the migration as failed but leaves the instance as is.

The best solution I can think of is to wrap the call to the post method in a try except that will set the instance to the target host if any exception occurs. Given that in some circumstances the source instance could still be present, i.e. not cleaned up and the networking to the target might not be setup correctly so I'm thinking maybe the instance on the target should be placed in error state to indicate that there may be an issue? Alternatively, is the fact that the migration status will be failed enough to indicate that some further operator action might be needed?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/379491

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Paul Carlton (<email address hidden>) on branch: master
Review: https://review.openstack.org/379491
Reason: Leaving for someone else to fix as they see fit

Revision history for this message
Sivasathurappan Radhakrishnan (siva-radhakrishnan) wrote :

Since Paul Carton abandoned his patch, removing him as assignee.

Changed in nova:
assignee: Paul Carlton (paul-carlton2) → nobody
status: In Progress → Confirmed
Revision history for this message
Matthew Booth (mbooth-9) wrote :

I think this bug is pretty serious. Say we fail get a cinder error in driver.post_live_migration() (this specific example is taken from a customer bug):

ComputeManager._post_live_migration() does:

  ...
  self.driver.post_live_migration(ctxt, instance, block_device_info,
                                        migrate_data)
  ...
  self.compute_rpcapi.post_live_migration_at_destination(ctxt,
                    instance, block_migration, dest)

The above code runs on the source compute. We update instance.host to the destination in post_live_migration_at_destination. Therefore driver.post_live_migration() above fails, we never call post_live_migration_at_destination, and we never update instance.host to point to the destination.

Hostever, _post_live_migration is called via callback from the driver *after* migration has occurred. So at this point the VM is *actually running* on the destination, but Nova thinks it's still on the source. The instance will be in an error state, and a hard reboot at this point will cause it to start running again on the source, at which point it will be running on 2 compute hosts simultaneously.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/609517

Changed in nova:
assignee: nobody → Artom Lifshitz (notartom)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611083

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/611084

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/611093

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/609517
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5513f48dea529fe4e690f50a462300129594210c
Submitter: Zuul
Branch: master

commit 5513f48dea529fe4e690f50a462300129594210c
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

    Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/611083
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cf3c2f391ad0f9d2a2d94247509bcb2709413e4f
Submitter: Zuul
Branch: stable/rocky

commit cf3c2f391ad0f9d2a2d94247509bcb2709413e4f
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

    Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5
    (cherry picked from commit 5513f48dea529fe4e690f50a462300129594210c)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/611084
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53f9c8e5104076872021ed33e221e5526ca5059b
Submitter: Zuul
Branch: stable/queens

commit 53f9c8e5104076872021ed33e221e5526ca5059b
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

    Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5
    (cherry picked from commit 5513f48dea529fe4e690f50a462300129594210c)
    (cherry picked from commit cf3c2f391ad0f9d2a2d94247509bcb2709413e4f)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/611093
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=28bc3c8221c6f12bd9f1dd6701026f389b1fc2f6
Submitter: Zuul
Branch: stable/pike

commit 28bc3c8221c6f12bd9f1dd6701026f389b1fc2f6
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

    Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

    Conflicts in nova/compute/manager.py due to absence of new Cinder flow
    conditional (and corresponding modifications to tests).

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5
    (cherry picked from commit 5513f48dea529fe4e690f50a462300129594210c)
    (cherry picked from commit cf3c2f391ad0f9d2a2d94247509bcb2709413e4f)
    (cherry picked from commit 53f9c8e5104076872021ed33e221e5526ca5059b)

tags: added: in-stable-pike
Revision history for this message
Matt Riedemann (mriedem) wrote :

Bug 1818873 is related, possibly a duplicate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/670016

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/pike)

Change abandoned by huanhongda (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/670016
Reason: Maybe cherry-pick 013f421bca4067bd430a9fac1e3b290cf1388ee4 is a better way.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/670016
Reason: https://review.opendev.org/#/c/683008/ is better.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/791135

Revision history for this message
sean mooney (sean-k-mooney) wrote :

downstream we are tarcking this as
https://bugzilla.redhat.com/show_bug.cgi?id=1959759

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/854499
Committed: https://opendev.org/openstack/nova/commit/a20baeca1f5ebb0dfe9607335a6986e9ed0e1725
Submitter: "Zuul (22348)"
Branch: master

commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/791135
Committed: https://opendev.org/openstack/nova/commit/8449b7caefa4a5c0728e11380a088525f15ad6f5
Submitter: "Zuul (22348)"
Branch: master

commit 8449b7caefa4a5c0728e11380a088525f15ad6f5
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/861856

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/861857

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/861871

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/861872

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/nova/+/861856
Committed: https://opendev.org/openstack/nova/commit/74a618a8118642c9fd32c4e0d502d12ac826affe
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 74a618a8118642c9fd32c4e0d502d12ac826affe
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds updating server after _live_migrate in reproducer
    test (missed in main commit)

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/863792

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/xena)

Change abandoned by "Amit Uniyal <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/863792
Reason: Test from GUI

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/863864

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/nova/+/861857
Committed: https://opendev.org/openstack/nova/commit/643b0c7d35752b214eee19b8d7298a19a8493f6b
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 643b0c7d35752b214eee19b8d7298a19a8493f6b
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/863900

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/863901

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/863902

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/863903

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/861871
Committed: https://opendev.org/openstack/nova/commit/71e5a1dbcc22aeaa798d3d06ce392cf73364b8db
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/864006

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/864007

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/863806

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/864055

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/861872
Committed: https://opendev.org/openstack/nova/commit/17ae907569e45cc0f5c7da9511bb668a877b7b2e
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/865381

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/865382

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Amit Uniyal <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/864671
Reason: Added because I thought its a good idea to have more test cases, abandoning, because its not really required from backport perspective

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/c/openstack/nova/+/865381
Committed: https://opendev.org/openstack/nova/commit/d3b46af01b7afa1a9051cb440a7986bfcb1a59b1
Submitter: "Zuul (22348)"
Branch: stable/train

commit d3b46af01b7afa1a9051cb440a7986bfcb1a59b1
Author: Artom Lifshitz <email address hidden>
Date: Fri May 1 13:47:44 2020 -0400

    func: Add _live_migrate helper to InstanceHelperMixin

    This is a partial backport of I70c4715de05d64fabc498b02d5c757af9450fbe9
    that introduced this helper will addressing feedback on
    Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 and
    I78e79112a9c803fb45d828cfb4641456da66364a that landed in Victoria.

    Follow-up for NUMA live migration functional tests

    This patch addresses outstanding feedback on
    Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 and
    I78e79112a9c803fb45d828cfb4641456da66364a.

    Related-Bug: #1628606

    Change-Id: I70c4715de05d64fabc498b02d5c757af9450fbe9
    (cherry picked from commit ca8f1f422298b0a26cf30165595d256f4fa71135)
    (cherry picked from commit 726ca4aec5ccea96748de88b2c2a2fd1a078cfc5)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863864
Committed: https://opendev.org/openstack/nova/commit/5efcc3f695e02d61cb8b881e009308c2fef3aa58
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863792
Committed: https://opendev.org/openstack/nova/commit/15502ddedc23e6591ace4e73fa8ce5b18b5644b0
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863900
Committed: https://opendev.org/openstack/nova/commit/ed1ea71489b60c0f95d76ab05f554cd046c60bac
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit ed1ea71489b60c0f95d76ab05f554cd046c60bac
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863901
Committed: https://opendev.org/openstack/nova/commit/43c0e40d288960760a6eaad05cb9670e01ef40d0
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 43c0e40d288960760a6eaad05cb9670e01ef40d0
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863902
Committed: https://opendev.org/openstack/nova/commit/6dda4f7ca3f25a11cd0178352ad24fe2e8b74785
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
    (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863903
Committed: https://opendev.org/openstack/nova/commit/0ac64bba8b7aba2fb358e00e970e88b32d26ef7e
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
    (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/864006
Committed: https://opendev.org/openstack/nova/commit/5e955b62fa63b72816369a21af283a2b64f4af27
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 5e955b62fa63b72816369a21af283a2b64f4af27
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds return server from _live_migrate in _integrated_helpers

    NOTE(auniyal): Differences
      * Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
    (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)
    (cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/864007
Committed: https://opendev.org/openstack/nova/commit/3885f983c358e5a5f0b10f603633193ac335a45f
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 3885f983c358e5a5f0b10f603633193ac335a45f
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
    (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)
    (cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/c/openstack/nova/+/865382
Committed: https://opendev.org/openstack/nova/commit/186db1751080effe0c674434bcd2c81b81ab8838
Submitter: "Zuul (22348)"
Branch: stable/train

commit 186db1751080effe0c674434bcd2c81b81ab8838
Author: Lee Yarwood <email address hidden>
Date: Wed Jul 29 10:51:34 2020 +0100

    func: Introduce a server_expected_state kwarg to InstanceHelperMixin._live_migrate

    Useful when testing live migration failures that leave the server in an
    non ACTIVE state. This change also renames the migration_final_status
    arg to migration_expected_state within the method to keep it in line
    with _create_server.

    NOTE(artom): This is to facilitate subsequent backports of live
    migration regression tests and bug fixes.

    Partial-Bug: #1628606

    Change-Id: Ie0852a89fc9423a92baa7c29a8806c0628cae220
    (cherry picked from commit e70ddd621cb59a8845a4241387d8a49e443b7b69)
    (cherry picked from commit 2b0cf8edf88c5f81696d72b04098aa12d1137e90)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/c/openstack/nova/+/863806
Committed: https://opendev.org/openstack/nova/commit/3969228eebac197fc9bbb700c5d6999e06dd71c5
Submitter: "Zuul (22348)"
Branch: stable/train

commit 3969228eebac197fc9bbb700c5d6999e06dd71c5
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

    Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds return server from _live_migrate in _integrated_helpers

    NOTE(auniyal): Differences
      * Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri

    NOTE(auniyal): Differences from ussuri to train
      * integrated_helpers: Added self.api parameter while calling
        wait_for_state_change.
      * regression: imported mock module, as unitetest.mock is addted post
        train release.
        as _create_server is not present in train used
        _build_minimal_create_server instead to _create_server.

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
    (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)
    (cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785)
    (cherry picked from commit 5e955b62fa63b72816369a21af283a2b64f4af27)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/c/openstack/nova/+/864055
Committed: https://opendev.org/openstack/nova/commit/ec31d4d22e4163a37f2a63f387b2614189d77ff9
Submitter: "Zuul (22348)"
Branch: stable/train

commit ec31d4d22e4163a37f2a63f387b2614189d77ff9
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

    [compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    NOTE(auniyal): Differences from ussuri to train
      * nova/tests/unit/compute/test_compute_mgr.py
      * Added instance.migration_context value to None, as fake_instance do not have this property in train

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
    (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)
    (cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e)
    (cherry picked from commit 3885f983c358e5a5f0b10f603633193ac335a45f)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.