Bug #1628606 “live migration does not clean up at target node if...” : Bugs : OpenStack Compute (nova)

Paul Carlton (paul-carlton2) on 2016-09-28

Changed in nova:
assignee:	nobody → Paul Carlton (paul-carlton2)

Sylvain Bauza (sylvain-bauza) on 2016-09-29

Changed in nova:
importance:	Undecided → Low
tags:	added: live-migration

Revision history for this message

Paul Carlton (paul-carlton2) wrote on 2016-09-29:

#1

Thinking about this it is not that simple. Once the instance has been started on the target it could do work that would be lost if we destroy it and resurrect the instance on the source. As we found out when Matt Booth was fixing the post copy network bug with certain neutron providers the instance at the target becomes accessible to the network immediately it starts up (due to arp'ing) so effectively once libvirt has un-paused the instance on the target and destroyed the instance on the the source we are effective beyond the point of no return.

Trouble is the instance host does not get updated until the end of the post migration processing so it still looks like it is on the source in a migrating state. If any step in post migration give rise to an exception it skips the rest of the post migration and updates the migration as failed but leaves the instance as is.

The best solution I can think of is to wrap the call to the post method in a try except that will set the instance to the target host if any exception occurs. Given that in some circumstances the source instance could still be present, i.e. not cleaned up and the networking to the target might not be setup correctly so I'm thinking maybe the instance on the target should be placed in error state to indicate that there may be an issue? Alternatively, is the fact that the migration status will be failed enough to indicate that some further operator action might be needed?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-29: Fix proposed to nova (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/379491

Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-05: Change abandoned on nova (master)

#3

Change abandoned by Paul Carlton (<email address hidden>) on branch: master
Review: https://review.openstack.org/379491
Reason: Leaving for someone else to fix as they see fit

Revision history for this message

Sivasathurappan Radhakrishnan (siva-radhakrishnan) wrote on 2017-01-13:

#4

Since Paul Carton abandoned his patch, removing him as assignee.

Changed in nova:
assignee:	Paul Carlton (paul-carlton2) → nobody
status:	In Progress → Confirmed

Revision history for this message

Matthew Booth (mbooth-9) wrote on 2018-08-03:

#5

I think this bug is pretty serious. Say we fail get a cinder error in driver.post_live_migration() (this specific example is taken from a customer bug):

ComputeManager._post_live_migration() does:

  ...
  self.driver.post_live_migration(ctxt, instance, block_device_info,
                                        migrate_data)
  ...
  self.compute_rpcapi.post_live_migration_at_destination(ctxt,
                    instance, block_migration, dest)

The above code runs on the source compute. We update instance.host to the destination in post_live_migration_at_destination. Therefore driver.post_live_migration() above fails, we never call post_live_migration_at_destination, and we never update instance.host to point to the destination.

Hostever, _post_live_migration is called via callback from the driver *after* migration has occurred. So at this point the VM is *actually running* on the destination, but Nova thinks it's still on the source. The instance will be in an error state, and a hard reboot at this point will cause it to start running again on the source, at which point it will be running on 2 compute hosts simultaneously.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-10: Fix proposed to nova (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/609517

Changed in nova:
assignee:	nobody → Artom Lifshitz (notartom)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-16: Fix proposed to nova (stable/rocky)

#7

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611083

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-16: Fix proposed to nova (stable/queens)

#8

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/611084

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-16: Fix proposed to nova (stable/pike)

#9

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/611093

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-17: Fix merged to nova (master)

#10

Reviewed: https://review.openstack.org/609517
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5513f48dea529fe4e690f50a462300129594210c
Submitter: Zuul
Branch: master

commit 5513f48dea529fe4e690f50a462300129594210c
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

Partial-bug: 1628606
Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-17: Fix merged to nova (stable/rocky)

#11

Reviewed: https://review.openstack.org/611083
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cf3c2f391ad0f9d2a2d94247509bcb2709413e4f
Submitter: Zuul
Branch: stable/rocky

commit cf3c2f391ad0f9d2a2d94247509bcb2709413e4f
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5
    (cherry picked from commit 5513f48dea529fe4e690f50a462300129594210c)

tags:

added: in-stable-rocky

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-18: Fix merged to nova (stable/queens)

#12

Reviewed: https://review.openstack.org/611084
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53f9c8e5104076872021ed33e221e5526ca5059b
Submitter: Zuul
Branch: stable/queens

commit 53f9c8e5104076872021ed33e221e5526ca5059b
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5
    (cherry picked from commit 5513f48dea529fe4e690f50a462300129594210c)
    (cherry picked from commit cf3c2f391ad0f9d2a2d94247509bcb2709413e4f)

tags:

added: in-stable-queens

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-23: Fix merged to nova (stable/pike)

#13

Reviewed: https://review.openstack.org/611093
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=28bc3c8221c6f12bd9f1dd6701026f389b1fc2f6
Submitter: Zuul
Branch: stable/pike

commit 28bc3c8221c6f12bd9f1dd6701026f389b1fc2f6
Author: Artom Lifshitz <email address hidden>
Date: Wed Oct 10 14:53:14 2018 -0400

Handle volume API failure in _post_live_migration

    Previously, if the call to Cinder in _post_live_migration failed, the
    exception went unhandled and prevented us from calling
    post_live_migration_at_destination - which is where we set instance
    host and task state. This left the system in an inconsistent state,
    with the instance actually running on the destination, but
    with instance.host still set to the source. This patch simply wraps
    the Cinder API calls in a try/except, and logs the exception instead
    of blowing up. While "dumb", this has the virtue of being simple and
    minimizing potential side effects. A comprehensive refactoring of
    when, where and how we set instance host and task state to try to
    guarantee consistency is left as a TODO.

Conflicts in nova/compute/manager.py due to absence of new Cinder flow
conditional (and corresponding modifications to tests).

    Partial-bug: 1628606
    Change-Id: Icb0bdaf454935b3713c35339394d260b33520de5
    (cherry picked from commit 5513f48dea529fe4e690f50a462300129594210c)
    (cherry picked from commit cf3c2f391ad0f9d2a2d94247509bcb2709413e4f)
    (cherry picked from commit 53f9c8e5104076872021ed33e221e5526ca5059b)

tags:

added: in-stable-pike

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-03-06:

#14

Bug 1818873 is related, possibly a duplicate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-10: Fix proposed to nova (stable/pike)

#15

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/670016

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-11: Change abandoned on nova (stable/pike)

#16

Change abandoned by huanhongda (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/670016
Reason: Maybe cherry-pick 013f421bca4067bd430a9fac1e3b290cf1388ee4 is a better way.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-07:

#17

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/pike
Review: https://review.opendev.org/670016
Reason: https://review.opendev.org/#/c/683008/ is better.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-13: Fix proposed to nova (master)

#18

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/791135

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2021-06-18:

#19

downstream we are tarcking this as
https://bugzilla.redhat.com/show_bug.cgi?id=1959759

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-14: Related fix merged to nova (master)

#20

Reviewed: https://review.opendev.org/c/openstack/nova/+/854499
Committed: https://opendev.org/openstack/nova/commit/a20baeca1f5ebb0dfe9607335a6986e9ed0e1725
Submitter: "Zuul (22348)"
Branch: master

commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

Related-Bug: #1628606
Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-17: Fix merged to nova (master)

#21

Reviewed: https://review.opendev.org/c/openstack/nova/+/791135
Committed: https://opendev.org/openstack/nova/commit/8449b7caefa4a5c0728e11380a088525f15ad6f5
Submitter: "Zuul (22348)"
Branch: master

commit 8449b7caefa4a5c0728e11380a088525f15ad6f5
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
Partial-Bug: #1628606

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-19: Related fix proposed to nova (stable/zed)

#22

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/861856

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-19: Fix proposed to nova (stable/zed)

#23

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/861857

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-19: Related fix proposed to nova (stable/yoga)

#24

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/861871

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-19: Fix proposed to nova (stable/yoga)

#25

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/861872

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Related fix merged to nova (stable/zed)

#26

Reviewed: https://review.opendev.org/c/openstack/nova/+/861856
Committed: https://opendev.org/openstack/nova/commit/74a618a8118642c9fd32c4e0d502d12ac826affe
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 74a618a8118642c9fd32c4e0d502d12ac826affe
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

    changes:
    adds updating server after _live_migrate in reproducer
    test (missed in main commit)

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)

tags:

added: in-stable-zed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Fix proposed to nova (stable/xena)

#27

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/863792

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Change abandoned on nova (stable/xena)

#28

Change abandoned by "Amit Uniyal <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/863792
Reason: Test from GUI

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Related fix proposed to nova (stable/xena)

#29

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/863864

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Fix merged to nova (stable/zed)

#30

Reviewed: https://review.opendev.org/c/openstack/nova/+/861857
Committed: https://opendev.org/openstack/nova/commit/643b0c7d35752b214eee19b8d7298a19a8493f6b
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 643b0c7d35752b214eee19b8d7298a19a8493f6b
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Related fix proposed to nova (stable/wallaby)

#31

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/863900

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Fix proposed to nova (stable/wallaby)

#32

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/863901

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Related fix proposed to nova (stable/victoria)

#33

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/863902

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-07: Fix proposed to nova (stable/victoria)

#34

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/863903

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-08: Related fix merged to nova (stable/yoga)

#35

Reviewed: https://review.opendev.org/c/openstack/nova/+/861871
Committed: https://opendev.org/openstack/nova/commit/71e5a1dbcc22aeaa798d3d06ce392cf73364b8db
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

changes:
adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)

tags:

added: in-stable-yoga

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-08: Related fix proposed to nova (stable/ussuri)

#36

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/864006

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-08: Fix proposed to nova (stable/ussuri)

#37

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/864007

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-08: Related fix proposed to nova (stable/train)

#38

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/863806

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-08: Fix proposed to nova (stable/train)

#39

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/864055

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-09: Fix merged to nova (stable/yoga)

#40

Reviewed: https://review.opendev.org/c/openstack/nova/+/861872
Committed: https://opendev.org/openstack/nova/commit/17ae907569e45cc0f5c7da9511bb668a877b7b2e
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-23: Related fix proposed to nova (stable/train)

#41

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/865381

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-23: Fix proposed to nova (stable/train)

#42

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/865382

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-24: Change abandoned on nova (stable/train)

#43

Change abandoned by "Amit Uniyal <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/864671
Reason: Added because I thought its a good idea to have more test cases, abandoning, because its not really required from backport perspective

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-24: Related fix merged to nova (stable/train)

#44

Reviewed: https://review.opendev.org/c/openstack/nova/+/865381
Committed: https://opendev.org/openstack/nova/commit/d3b46af01b7afa1a9051cb440a7986bfcb1a59b1
Submitter: "Zuul (22348)"
Branch: stable/train

commit d3b46af01b7afa1a9051cb440a7986bfcb1a59b1
Author: Artom Lifshitz <email address hidden>
Date: Fri May 1 13:47:44 2020 -0400

func: Add _live_migrate helper to InstanceHelperMixin

    This is a partial backport of I70c4715de05d64fabc498b02d5c757af9450fbe9
    that introduced this helper will addressing feedback on
    Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 and
    I78e79112a9c803fb45d828cfb4641456da66364a that landed in Victoria.

Follow-up for NUMA live migration functional tests

    This patch addresses outstanding feedback on
    Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 and
    I78e79112a9c803fb45d828cfb4641456da66364a.

Related-Bug: #1628606

    Change-Id: I70c4715de05d64fabc498b02d5c757af9450fbe9
    (cherry picked from commit ca8f1f422298b0a26cf30165595d256f4fa71135)
    (cherry picked from commit 726ca4aec5ccea96748de88b2c2a2fd1a078cfc5)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-24: Related fix merged to nova (stable/xena)

#45

Reviewed: https://review.opendev.org/c/openstack/nova/+/863864
Committed: https://opendev.org/openstack/nova/commit/5efcc3f695e02d61cb8b881e009308c2fef3aa58
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

changes:
adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)

tags:

added: in-stable-xena

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-24: Fix merged to nova (stable/xena)

#46

Reviewed: https://review.opendev.org/c/openstack/nova/+/863792
Committed: https://opendev.org/openstack/nova/commit/15502ddedc23e6591ace4e73fa8ce5b18b5644b0
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-29: Related fix merged to nova (stable/wallaby)

#47

Reviewed: https://review.opendev.org/c/openstack/nova/+/863900
Committed: https://opendev.org/openstack/nova/commit/ed1ea71489b60c0f95d76ab05f554cd046c60bac
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit ed1ea71489b60c0f95d76ab05f554cd046c60bac
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

changes:
adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)

tags:

added: in-stable-wallaby

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-29: Fix merged to nova (stable/wallaby)

#48

Reviewed: https://review.opendev.org/c/openstack/nova/+/863901
Committed: https://opendev.org/openstack/nova/commit/43c0e40d288960760a6eaad05cb9670e01ef40d0
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 43c0e40d288960760a6eaad05cb9670e01ef40d0
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-30: Related fix merged to nova (stable/victoria)

#49

Reviewed: https://review.opendev.org/c/openstack/nova/+/863902
Committed: https://opendev.org/openstack/nova/commit/6dda4f7ca3f25a11cd0178352ad24fe2e8b74785
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

changes:
adds return server from _live_migrate in _integrated_helpers

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
    (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)

tags:

added: in-stable-victoria

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-12-01: Fix merged to nova (stable/victoria)

#50

Reviewed: https://review.opendev.org/c/openstack/nova/+/863903
Committed: https://opendev.org/openstack/nova/commit/0ac64bba8b7aba2fb358e00e970e88b32d26ef7e
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
    (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-01-11: Related fix merged to nova (stable/ussuri)

#51

Reviewed: https://review.opendev.org/c/openstack/nova/+/864006
Committed: https://opendev.org/openstack/nova/commit/5e955b62fa63b72816369a21af283a2b64f4af27
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 5e955b62fa63b72816369a21af283a2b64f4af27
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

changes:
adds return server from _live_migrate in _integrated_helpers

NOTE(auniyal): Differences
* Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
    (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)
    (cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785)

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-01-16: Fix merged to nova (stable/ussuri)

#52

Reviewed: https://review.opendev.org/c/openstack/nova/+/864007
Committed: https://opendev.org/openstack/nova/commit/3885f983c358e5a5f0b10f603633193ac335a45f
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 3885f983c358e5a5f0b10f603633193ac335a45f
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
    (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)
    (cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-01-24: Fix merged to nova (stable/train)

#53

Reviewed: https://review.opendev.org/c/openstack/nova/+/865382
Committed: https://opendev.org/openstack/nova/commit/186db1751080effe0c674434bcd2c81b81ab8838
Submitter: "Zuul (22348)"
Branch: stable/train

commit 186db1751080effe0c674434bcd2c81b81ab8838
Author: Lee Yarwood <email address hidden>
Date: Wed Jul 29 10:51:34 2020 +0100

func: Introduce a server_expected_state kwarg to InstanceHelperMixin._live_migrate

    Useful when testing live migration failures that leave the server in an
    non ACTIVE state. This change also renames the migration_final_status
    arg to migration_expected_state within the method to keep it in line
    with _create_server.

NOTE(artom): This is to facilitate subsequent backports of live
migration regression tests and bug fixes.

Partial-Bug: #1628606

    Change-Id: Ie0852a89fc9423a92baa7c29a8806c0628cae220
    (cherry picked from commit e70ddd621cb59a8845a4241387d8a49e443b7b69)
    (cherry picked from commit 2b0cf8edf88c5f81696d72b04098aa12d1137e90)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-03-02: Related fix merged to nova (stable/train)

#54

Reviewed: https://review.opendev.org/c/openstack/nova/+/863806
Committed: https://opendev.org/openstack/nova/commit/3969228eebac197fc9bbb700c5d6999e06dd71c5
Submitter: "Zuul (22348)"
Branch: stable/train

commit 3969228eebac197fc9bbb700c5d6999e06dd71c5
Author: Amit Uniyal <email address hidden>
Date: Thu Aug 25 05:08:44 2022 +0000

Adds a repoducer for post live migration fail

    Adds a regression test or repoducer for post live migration
    fail at destination, the possible casue can be fail to get
    instance network info or block device info

changes:
adds return server from _live_migrate in _integrated_helpers

NOTE(auniyal): Differences
* Replaced GlanceFixture with fake.stub_out_image_service in regression test, as GlanceFixture does not exist in Ussuri

    NOTE(auniyal): Differences from ussuri to train
      * integrated_helpers: Added self.api parameter while calling
        wait_for_state_change.
      * regression: imported mock module, as unitetest.mock is addted post
        train release.
        as _create_server is not present in train used
        _build_minimal_create_server instead to _create_server.

    Related-Bug: #1628606
    Change-Id: I48dbe0aae8a3943fdde69cda1bd663d70ea0eb19
    (cherry picked from commit a20baeca1f5ebb0dfe9607335a6986e9ed0e1725)
    (cherry picked from commit 74a618a8118642c9fd32c4e0d502d12ac826affe)
    (cherry picked from commit 71e5a1dbcc22aeaa798d3d06ce392cf73364b8db)
    (cherry picked from commit 5efcc3f695e02d61cb8b881e009308c2fef3aa58)
    (cherry picked from commit ed1ea71489b60c0f95d76ab05f554cd046c60bac)
    (cherry picked from commit 6dda4f7ca3f25a11cd0178352ad24fe2e8b74785)
    (cherry picked from commit 5e955b62fa63b72816369a21af283a2b64f4af27)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-04-03: Fix merged to nova (stable/train)

#55

Reviewed: https://review.opendev.org/c/openstack/nova/+/864055
Committed: https://opendev.org/openstack/nova/commit/ec31d4d22e4163a37f2a63f387b2614189d77ff9
Submitter: "Zuul (22348)"
Branch: stable/train

commit ec31d4d22e4163a37f2a63f387b2614189d77ff9
Author: Sean Mooney <email address hidden>
Date: Thu May 13 12:48:21 2021 +0100

[compute] always set instance.host in post_livemigration

    This change add a new _post_live_migration_update_host
    function that wraps _post_live_migration and just ensures
    that if we exit due to an exception instance.host is set
    to the destination host.

    when we are in _post_live_migration the guest has already
    started running on the destination host and we cannot revert.
    Sometimes admins or users will hard reboot the instance expecting
    that to fix everything when the vm enters the error state after
    the failed migrations. Previously this would end up recreating the
    instance on the source node leading to possible data corruption if
    the instance used shared storage.

    NOTE(auniyal): Differences from ussuri to train
      * nova/tests/unit/compute/test_compute_mgr.py
      * Added instance.migration_context value to None, as fake_instance do not have this property in train

    Change-Id: Ibc4bc7edf1c8d1e841c72c9188a0a62836e9f153
    Partial-Bug: #1628606
    (cherry picked from commit 8449b7caefa4a5c0728e11380a088525f15ad6f5)
    (cherry picked from commit 643b0c7d35752b214eee19b8d7298a19a8493f6b)
    (cherry picked from commit 17ae907569e45cc0f5c7da9511bb668a877b7b2e)
    (cherry picked from commit 15502ddedc23e6591ace4e73fa8ce5b18b5644b0)
    (cherry picked from commit 43c0e40d288960760a6eaad05cb9670e01ef40d0)
    (cherry picked from commit 0ac64bba8b7aba2fb358e00e970e88b32d26ef7e)
    (cherry picked from commit 3885f983c358e5a5f0b10f603633193ac335a45f)

	Status	Importance	Assigned to
OpenStack Compute (nova)	In Progress	Low	Artom Lifshitz
Wallaby	New	Undecided	Unassigned
Xena	New	Undecided	Unassigned
Yoga	New	Undecided	Unassigned
Zed	New	Undecided	Unassigned

OpenStack Compute (nova)

live migration does not clean up at target node if a failure occurs during post migration

Bug Description

Other bug subscribers

Remote bug watches