Bug #1417667 “Move operations need to recalculate NUMA topology” : Bugs : OpenStack Compute (nova)

Revision history for this message

Chris Friesen (cbf123) wrote on 2015-02-03:

#1

On a related note, I just did a resize of an instance from a flavor with dedicated CPUs, 2 cpus and no specified numa information to a flavor with dedicated CPUs, 2 cpus and two NUMA nodes.

As part of the resize this ended up triggering a switch to another compute node. Also, the XML file for the instance being started up did not reflect the NUMA settings for the new flavor.

Lastly, as per above the host cpus were not re-evaluated on the new compute node and so ended up using host cpus that were already in use by another instance with dedicated CPUs.

melanie witt (melwitt) on 2015-02-13

Changed in nova:
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2015-02-17:

#2

Marking this one as duplicate as the other bug is slightly more generic than this one as this happens for any NUMA instance regardless of weather 1-1 CPU pinning was requested.

Revision history for this message

Chris Friesen (cbf123) wrote on 2015-02-17:

#3

What about recalculating CPUs during live migration? That doesn't go through the resize path so I'm not sure that 1370390 would apply.

Revision history for this message

Chris Friesen (cbf123) wrote on 2015-02-20:

#4

It's definitely true that bug #1370390 would cover the resize case that I mentioned in note #1. I think that we still need to keep this bug open for the more general case of live-migration, evacuate, rebuild, etc.

Chris Friesen (cbf123) on 2015-02-26

Changed in nova:
assignee:	nobody → Chris Friesen (cbf123)

Revision history for this message

Ian Jolliffe (ian-jolliffe) wrote on 2015-02-26: Re: [Bug 1417667] Re: migration/evacuation/rebuild/resize of instance with dedicated cpus needs to recalculate cpus on destination

#5

Thanks!

Ian

> On Feb 26, 2015, at 10:01 AM, Chris Friesen <email address hidden> wrote:
>
> ** Changed in: nova
> Assignee: (unassigned) => Chris Friesen (cbf123)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1417667
>
> Title:
> migration/evacuation/rebuild/resize of instance with dedicated cpus
> needs to recalculate cpus on destination
>
> Status in OpenStack Compute (Nova):
> Confirmed
>
> Bug description:
> I'm running nova trunk, commit 752954a.
>
> I configured a flavor with two vcpus and extra specs
> "hw:cpu_policy=dedicated" in order to enable vcpu pinning.
>
> I booted up a number of instances such that there was one instance
> affined to host cpus 12 and 13 on compute-0, and another instance
> affined to cpus 12 and 13 on compute-2. (As reported by "virsh
> vcpupin" and "virsh dumpxml".)
>
> I then triggered a live migration of one instance from compute-0 to
> compute-2. This resulted in both instances being affined to host cpus
> 12 and 13 on compute-2.
>
> The "hw:cpu_policy=dedicated" extra spec is intended to provide
> dedicated host cpus for the instance. In order to provide this, on a
> live migration (or cold migration, or rebuild, or evacuation, or
> resize, etc.) nova needs to ensure that the instance is affined to
> host cpus that are not currently being used by other instances.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1417667/+subscriptions

Bart Wensley (bartwensley) on 2015-03-02

Changed in nova:
assignee:	Chris Friesen (cbf123) → Bart Wensley (bartwensley)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-11: Fix proposed to nova (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/163440

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Bart Wensley (bartwensley) wrote on 2015-03-30: Re: migration/evacuation/rebuild/resize of instance with dedicated cpus needs to recalculate cpus on destination

#7

For reference, here are some details about the approach I am taking - I sent this to the openstack-dev list a few weeks back...

I have been exercising the numa topology related features in kilo (cpu pinning, numa topology, huge pages) and have seen that there are issues when an operation moves an instance between compute nodes. In summary, the numa_topology is not recalculated for the destination node, which results in the instance running with the wrong topology (or even failing to run if the topology isn't supported on the destination). This impacts live migration, cold migration, resize and evacuate.

I have spent some time over the last couple weeks and have a working fix for these issues that I would like to push upstream. The fix for cold migration and resize is the most straightforward, so I plan to start there.

At a high level, here is what I have done to fix cold migrate and resize:
- Add the source_numa_topology and dest_numa_topology to the migration
  object and migrations table.
- When a resize_claim is done, store the claimed numa topology in the
  dest_numa_topology in the migration record. Also store the current
  numa topology as the source_numa_topology in the migration record.
- Use the source_numa_topology and dest_numa_topology from the
  migration record in the resource accounting when referencing
  migration claims as appropriate. This is done for claims, dropped
  claims and the resource audit.
- Set the numa_topology in the instance after the cold migration/resize
  is finished to the dest_numa_topology from the migration object -
  done in finish_resize RPC on the destination compute to match where
  the rest of the resources for the instance are updated (there is a
  call to _set_instance_info here that sets the memory, vcpus, disk
  space, etc... for the migrated instance).
- Set the numa_topology in the instance if the cold migration/resize is
  reverted to the source_numa_topology from the migration object -
  done in finish_revert_resize RPC on the source compute.

I would appreciate any comments on my approach. I plan to start submitting the code for this against bug 1417667 - I will split it into several chunks to make it easier to review.

Fixing live migration was significantly more effort - I'll start a different thread on that once I have feedback on the above approach.

For reference, here are some details about the approach I am taking - I sent this to the openstack-dev list a few weeks back...

I have been exercising the numa topology related features in kilo (cpu pinning, numa topology, huge pages) and have seen that there are issues when an operation moves an instance between compute nodes. In summary, the numa_topology is not recalculated for the destination node, which results in the instance running with the wrong topology (or even failing to run if the topology isn't supported on the destination). This impacts live migration, cold migration, resize and evacuate.

I have spent some time over the last couple weeks and have a working fix for these issues that I would like to push upstream. The fix for cold migration and resize is the most straightforward, so I plan to start there.

At a high level, here is what I have done to fix cold migrate and resize:
- Add the source_numa_topology and dest_numa_topology to the migration
  object and migrations table.
- When a resize_claim is done, store the claimed numa topology in the
  dest_numa_topology in the migration record. Also store the current
  numa topology as the source_numa_topology in the migration record.
- Use the source_numa_topology and dest_numa_topology from the
  migration record in the resource accounting when referencing
  migration claims as appropriate. This is done for claims, dropped
  claims and the resource audit.
- Set the numa_topology in the instance after the cold migration/resize
  is finished to the dest_numa_topology from the migration object -
  done in finish_resize RPC on the destination compute to match where
  the rest of the resources for the instance are updated (there is a
  call to _set_instance_info here that sets the memory, vcpus, disk
  space, etc... for the migrated instance).
- Set the numa_topology in the instance if the cold migration/resize is
  reverted to the source_numa_topology from the migration object -
  done in finish_revert_resize RPC on the source compute.

I would appreciate any comments on my approach. I plan to start submitting the code for this against bug 1417667 - I will split it into several chunks to make it easier to review.

Fixing live migration was significantly more effort - I'll start a different thread on that once I have feedback on the above approach.

Pawel Koniszewski (pawel-koniszewski) on 2015-06-01

tags:

added: live-migrate

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-17: Change abandoned on nova (master)

#8

Change abandoned by Bart Wensley (<email address hidden>) on branch: master
Review: https://review.openstack.org/163440
Reason: Based on review comments and IRC discussions, a different approach is required. The new approach will use the new resource objects described here: https://blueprints.launchpad.net/nova/+spec/make-resource-tracker-use-objects

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-28: Fix proposed to nova (master)

#9

Fix proposed to branch: master
Review: https://review.openstack.org/218385

Changed in nova:
assignee:	Bart Wensley (bartwensley) → Nikola Đipanov (ndipanov)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-28:

#10

Fix proposed to branch: master
Review: https://review.openstack.org/218500

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-31:

#11

Fix proposed to branch: master
Review: https://review.openstack.org/218938

John Garbutt (johngarbutt) on 2015-09-15

tags:

added: liberty-rc-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-15: Fix merged to nova (master)

#12

Reviewed: https://review.openstack.org/218385
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c458921729821f3d2aad5a9b76fbc1f80dcd8962
Submitter: Jenkins
Branch: master

commit c458921729821f3d2aad5a9b76fbc1f80dcd8962
Author: Nikola Dipanov <email address hidden>
Date: Fri Aug 28 16:14:05 2015 +0100

Claims: Make sure move claims create a migration context records

    This way we will be able to keep track of the resources that were
    actually claimed for those resources (such as NUMA topology) which are
    not just a simple sum but actually have state.

    This puts in the infra for tracking the claimed resources, however 2
    additional steps are needed to fully use this and make resource tracking
    work for all instances when migrating:

    1) We don't use the resources claimed and stashed in the Migration
    context to do resource tracking. This will be a follow-on fix in the
    resource tracker.

    2) We still don't properly assign the newly claimed topology to the
    instance once the migration is confirmed - this will be done in a follow
    up patch that builds on this work.

    Change-Id: Ib0532f0cbebc5568d9b94b74b7a1555fcf871df5
    Partial-bug: #1417667
    Related-blueprint: migration-fix-resource-tracking

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-15:

#13

Reviewed: https://review.openstack.org/218500
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a298455398eb71f81a65782266450dcec444e56b
Submitter: Jenkins
Branch: master

commit a298455398eb71f81a65782266450dcec444e56b
Author: Nikola Dipanov <email address hidden>
Date: Fri Aug 28 23:23:02 2015 +0100

compute: migrate/resize paths properly handle stashed numa_topology

    This patch makes the migrate and resize code paths in the compute
    service take advantage of the now stashed data in the context_migration
    field of the instance to apply the claimed NUMA topology to the instance
    and provision it (and to potentially revert back to the old one on the
    source host)

    Change-Id: Ib91f211e87c1770c1997b3b8ff01d55092c896bf
    Partial-bug: #1417667
    Related-blueprint: migration-fix-resource-tracking

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-15:

#14

Reviewed: https://review.openstack.org/218938
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7fd3032f39cc6cc9b8e87834aa037bc4fb5e5620
Submitter: Jenkins
Branch: master

commit 7fd3032f39cc6cc9b8e87834aa037bc4fb5e5620
Author: Nikola Dipanov <email address hidden>
Date: Mon Aug 31 17:14:36 2015 +0100

RT: Migration resource tracking uses migration context

Now that we claim and stash the claimed resources (currently only NUMA
topology is tracked this way).

    This patch fixes the migration of instances with NUMA requirements as
    they will now be properly tracked and subsequent boot/migrate requests
    will see a consistent view of resources.

    Change-Id: I16137ed3a34622e1edaa61f1793fe38ea00ec251
    Partial-bug: #1417667
    Related-blueprint: migration-fix-resource-tracking

Revision history for this message

John Garbutt (johngarbutt) wrote on 2015-09-21: Re: migration/evacuation/rebuild/resize of instance with dedicated cpus needs to recalculate cpus on destination

#15

adding to RC1 because of the RPC change that would mean we can't backport the fix.

Changed in nova:
milestone:	none → liberty-rc1

Revision history for this message

Michael Still (mikal) wrote on 2015-09-22:

#16

The patches in comments 12 onwards are all merged. Is there anything else to do here?

Revision history for this message

John Garbutt (johngarbutt) wrote on 2015-09-22:

#17

This is the remaining patch in the series:
https://review.openstack.org/#/c/200485/

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2015-09-22:

#18

In addition those patches only solve some of this for resize and evacuate case. Live migration will still need work in Mitaka

Revision history for this message

Chris Friesen (cbf123) wrote on 2015-09-22:

#19

I haven't looked at the full set of patches yet...do they also handle reallocation of PCI/SRIOV devices over a resize/evacuate?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-22: Related fix proposed to nova (master)

#20

Related fix proposed to branch: master
Review: https://review.openstack.org/226411

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-23: Related fix merged to nova (master)

#21

Reviewed: https://review.openstack.org/200485
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc0221d7240326a2d1b467e2a367bebb7e764e61
Submitter: Jenkins
Branch: master

commit dc0221d7240326a2d1b467e2a367bebb7e764e61
Author: Nikola Dipanov <email address hidden>
Date: Wed Aug 19 17:55:49 2015 +0100

rebuild: RPC sends additional args and claims are done

    If a migration record was created for an evacuation operation, we will
    want to pass it over to the compute node. It will be required for
    resource tracking, and we can avoid a round trip back to the conductor
    by making it a part of the RPC call.

Also - if we called the scheduler, we also know the node of the chosen
host, and the limits set, and this information is needed for doing claims.

We also make sure that rebuild now uses claims and resource tracking.

    There is a final piece of the puzzle missing, and that is the following
    patch which actually makes resource tracker update it's resources based
    on the evacuation records, and adds the relevant tests.

Related-bug: #1417667
Change-Id: I0233f964d8f294f0ffd9edcb16b1aaf93486177f

Revision history for this message

John Garbutt (johngarbutt) wrote on 2015-09-23: Re: migration/evacuation/rebuild/resize of instance with dedicated cpus needs to recalculate cpus on destination

#22

removing RC1 milestone, as we have merged what we need for RC1

Changed in nova:
milestone:	liberty-rc1 → none

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-23: Related fix merged to nova (master)

#23

Reviewed: https://review.openstack.org/226411
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4ee4f9f2ec639d3361d65dd2187f5bde48be9970
Submitter: Jenkins
Branch: master

commit 4ee4f9f2ec639d3361d65dd2187f5bde48be9970
Author: Nikola Dipanov <email address hidden>
Date: Tue Sep 22 16:58:20 2015 +0100

RT: track evacuation migrations

    The previous patch sets the stage for tracking resources when doing
    evacuate, but we still have to make sure that
    _update_usage_from_migration actually does the update of the resources.
    To do that we have to make sure we are getting the right flavor in case
    of 'evacuation' migration, and also that we are not completely skipping
    over it.

Once that's done claims and resource tracking work as expected when
evacuating instances.

DocImpact

Change-Id: Ie74939e543155bc42705b28e1b44d943ef54ebdc
Related-bug: #1417667

Matt Riedemann (mriedem) on 2015-09-23

tags:

removed: liberty-rc-potential

Paul Murray (pmurray) on 2015-11-06

tags:

added: live-migration
removed: live-migrate

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12: Related fix proposed to nova (master)

#24

Related fix proposed to branch: master
Review: https://review.openstack.org/244484

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12: Fix proposed to nova (master)

#25

Fix proposed to branch: master
Review: https://review.openstack.org/244485

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12: Related fix proposed to nova (master)

#26

Related fix proposed to branch: master
Review: https://review.openstack.org/244486

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12: Fix proposed to nova (master)

#27

Fix proposed to branch: master
Review: https://review.openstack.org/244488

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-12:

#28

Fix proposed to branch: master
Review: https://review.openstack.org/244489

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-08: Related fix merged to nova (master)

#29

Reviewed: https://review.openstack.org/244484
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0bb21d5173bc0f2d4a3a9455e54a50befff7b581
Submitter: Jenkins
Branch: master

commit 0bb21d5173bc0f2d4a3a9455e54a50befff7b581
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 14:51:39 2015 +0100

compute: split check_can_live_migrate_destination

    We'll want to add claims around the logic of this method, so it's better
    if there is a nested method that does the business logic to
    avoid deep nesting that makes the code less readable.

Related-bug: 1417667
Change-Id: I8c2f1a67846fe27dbe350a288d7e9be6be109b90

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-08: Fix merged to nova (master)

#30

Reviewed: https://review.openstack.org/244485
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f73fe579a6b702450bc2b2d81f2813d3ef4c39eb
Submitter: Jenkins
Branch: master

commit f73fe579a6b702450bc2b2d81f2813d3ef4c39eb
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 16:36:49 2015 +0100

live-migrate: Change the status Migration is created with

    We want to be consistent with what resize does here, and there should be
    no difference with regards to resource tracking (i.e. resources now
    should not be tracked for this state).

Partial-bug: 1417667
Change-Id: I75f914fb28e337df48e8e68ae7e407b5239d2647

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-08: Related fix merged to nova (master)

#31

Reviewed: https://review.openstack.org/244486
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2c1f8e6002850a38c82739839be7245b786c9c0c
Submitter: Jenkins
Branch: master

commit 2c1f8e6002850a38c82739839be7245b786c9c0c
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 17:18:56 2015 +0100

Don't track migrations in 'accepted' state

    This state was added during Liberty for evacuate, and since it's the
    first state that is assigned to the migration record on it's creation,
    and before we start using it for resource tracking, there is no reason
    to consider those records as "in progress".

Change-Id: I2a9bbe87bc1518cee9e43e4eeec1fa585305f423
Related-bug: 1417667

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-17: Fix merged to nova (master)

#32

Reviewed: https://review.openstack.org/244488
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a8ba1d69dffa6943ca3cbf9e16c19191982c0a6
Submitter: Jenkins
Branch: master

commit 5a8ba1d69dffa6943ca3cbf9e16c19191982c0a6
Author: Nikola Dipanov <email address hidden>
Date: Thu Nov 12 05:55:28 2015 +0000

live-mig: Mark migration as failed on fail to schedule

    In case we reach maximum retires to schedule - we want to mark the
    migration record as failed, since the whole operation will be aborted at
    that point.

Change-Id: I3528e98b2dc596995556aec83bb65104b671571d
Partial-bug: #1417667

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-01: Related fix proposed to nova (master)

#33

Related fix proposed to branch: master
Review: https://review.openstack.org/286742

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-01: Fix proposed to nova (master)

#34

Fix proposed to branch: master
Review: https://review.openstack.org/286744

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-11: Related fix proposed to nova (stable/liberty)

#35

Related fix proposed to branch: stable/liberty
Review: https://review.openstack.org/304015

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-12: Related fix merged to nova (stable/liberty)

#36

Reviewed: https://review.openstack.org/304015
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fa3fb5e61f1648bc94b426451136da763b1cc73f
Submitter: Jenkins
Branch: stable/liberty

commit fa3fb5e61f1648bc94b426451136da763b1cc73f
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 17:18:56 2015 +0100

Don't track migrations in 'accepted' state

    This state was added during Liberty for evacuate, and since it's the
    first state that is assigned to the migration record on it's creation,
    and before we start using it for resource tracking, there is no reason
    to consider those records as "in progress".

    Change-Id: I2a9bbe87bc1518cee9e43e4eeec1fa585305f423
    Related-bug: 1417667
    (cherry picked from commit 2c1f8e6002850a38c82739839be7245b786c9c0c)

tags:

added: in-stable-liberty

OpenStack Infra (hudson-openstack) on 2016-06-01

Changed in nova:
assignee:	Nikola Đipanov (ndipanov) → Sylvain Bauza (sylvain-bauza)

OpenStack Infra (hudson-openstack) on 2016-06-13

Changed in nova:
assignee:	Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-16: Change abandoned on nova (master)

#37

Change abandoned by Daniel Berrange (<email address hidden>) on branch: master
Review: https://review.openstack.org/286742
Reason: Abadoning since its obsolet & nikola no longer works on nova

OpenStack Infra (hudson-openstack) on 2016-07-05

Changed in nova:
assignee:	sahid (sahid-ferdjaoui) → Sylvain Bauza (sylvain-bauza)

OpenStack Infra (hudson-openstack) on 2016-07-06

Changed in nova:
assignee:	Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)

OpenStack Infra (hudson-openstack) on 2016-10-06

Changed in nova:
assignee:	sahid (sahid-ferdjaoui) → Stephen Finucane (stephenfinucane)

OpenStack Infra (hudson-openstack) on 2016-10-20

Changed in nova:
assignee:	Stephen Finucane (stephenfinucane) → Pawel Koniszewski (pawel-koniszewski)

OpenStack Infra (hudson-openstack) on 2016-11-02

Changed in nova:
assignee:	Pawel Koniszewski (pawel-koniszewski) → sahid (sahid-ferdjaoui)

OpenStack Infra (hudson-openstack) on 2016-11-07

Changed in nova:
assignee:	sahid (sahid-ferdjaoui) → Pawel Koniszewski (pawel-koniszewski)

Revision history for this message

Stephen Finucane (stephenfinucane) wrote on 2016-12-09: Re: migration/evacuation/rebuild/resize of instance with NUMA topology needs to recalculate NUMA topology

#38

The overall NUMA topology is affected by this issues including huge pages

summary:

- migration/evacuation/rebuild/resize of instance with dedicated cpus
- needs to recalculate cpus on destination
+ migration/evacuation/rebuild/resize of instance with NUMA topology needs
+ to recalculate NUMA topology

OpenStack Infra (hudson-openstack) on 2017-02-09

Changed in nova:
assignee:	Pawel Koniszewski (pawel-koniszewski) → Andrey Volkov (avolkov)

Revision history for this message

Hua Zhang (zhhuabj) wrote on 2017-07-17:

#39

Have all full set of patches for live-migration case been uploaded yet ?

Andrey Volkov (avolkov) on 2017-07-18

Changed in nova:
assignee:	Andrey Volkov (avolkov) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-08-01: Change abandoned on nova (master)

#40

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/244489
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-08-01:

#41

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/286744
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

Chris Friesen (cbf123) wrote on 2017-08-08: Re: migration/evacuation/rebuild/resize of instance with NUMA topology needs to recalculate NUMA topology

#42

Hua Zhang (zhhuabj): the live-migration patches were under review (see https://review.openstack.org/#/c/286744 and the related patches) but were never merged.

Gayathri Devi (gayathrik) on 2017-09-07

Changed in nova:
assignee:	nobody → Gayathri Devi (gayathrik)

Revision history for this message

Gayathri Devi (gayathrik) wrote on 2017-09-07:

#43

This commit solves this issue in evacuate without host and in evacuate with host scenarios.
https://review.openstack.org/#/c/468227/

I tested below scenario where I have only two compute hosts/nodes available in my environment.
Booted VM1 (Vcpus tagged are 34,14,10,30) on compute host1
Booted VM2 (Vcpus tagged are 34,14,10,30) on compute host2
Evacuated VM1 by specifying target host as compute host2.
After evacuation, VM1 is moved to compute host2 successfully with non overlapping vcpus(28,32,12,16) tagged to it.

Please confirm whether this is actual scenario you are supposed to test?

Revision history for this message

Chris Friesen (cbf123) wrote on 2017-09-07:

#44

gayathrik, I think the issues with cold migration, evacuate, and resize have been dealt with. Live migration is still broken, see my comment #42 for the relevent patches.

Gayathri Devi (gayathrik) on 2017-09-11

Changed in nova:
assignee:	Gayathri Devi (gayathrik) → nobody

OpenStack Infra (hudson-openstack) on 2017-11-23

Changed in nova:
assignee:	nobody → Stephen Finucane (stephenfinucane)

OpenStack Infra (hudson-openstack) on 2018-01-17

Changed in nova:
assignee:	Stephen Finucane (stephenfinucane) → sahid (sahid-ferdjaoui)

Sylvain Bauza (sylvain-bauza) on 2018-02-13

summary:

- migration/evacuation/rebuild/resize of instance with NUMA topology needs
- to recalculate NUMA topology
+ migration/evacuation/rebuild/resize/shelve of instance with NUMA
+ topology needs to recalculate NUMA topology

Chris Friesen (cbf123) on 2018-02-13

summary:

- migration/evacuation/rebuild/resize/shelve of instance with NUMA
+ migration/evacuation/rebuild/resize/unshelve of instance with NUMA
topology needs to recalculate NUMA topology

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-04-05: Re: migration/evacuation/rebuild/resize/unshelve of instance with NUMA topology needs to recalculate NUMA topology

#45

This bug report is confusing. It mentions 'rebuild' but there is no move operation for a rebuild, nor a new resource claim.

Looking at the history of this:

https://review.openstack.org/#/q/topic:bug/1417667+(status:open+OR+status:merged)

It looks like https://review.openstack.org/#/c/226411/ fixes evacuate, and https://review.openstack.org/#/c/218938/ fixes cold migrate/resize.

Live migration is a known gap yet, which is probably why this bug is still open, and that's being worked on in Rocky with a new spec:

https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/numa-aware-live-migration.html

Matt Riedemann (mriedem) on 2018-04-05

summary:

- migration/evacuation/rebuild/resize/unshelve of instance with NUMA
- topology needs to recalculate NUMA topology
+ migration/evacuation/resize/unshelve of instance with NUMA topology
+ needs to recalculate NUMA topology

Revision history for this message

Chris Friesen (cbf123) wrote on 2018-04-05: Re: migration/evacuation/resize/unshelve of instance with NUMA topology needs to recalculate NUMA topology

#46

Removed mention of "rebuild", you're correct that no additional resources are needed there.

Yes, live migration is the remaining sticking point and why this bug is still open.

description:

updated

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-04-05:

#47

Per comments 45 and 46, I've marked this bug as fix released for evacuate/cold migrate and as noted, live migration with NUMA is being handled in the Rocky release via the linked spec.

Changed in nova:
status:	In Progress → Fix Released
assignee:	sahid (sahid-ferdjaoui) → Nikola Đipanov (ndipanov)

Revision history for this message

Chris Friesen (cbf123) wrote on 2018-04-05:

#48

There is also a separate bug report for live migration: https://bugs.launchpad.net/nova/+bug/1289064

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-16: Change abandoned on nova (master)

#49

Change abandoned by Stephen Finucane (<email address hidden>) on branch: master
Review: https://review.openstack.org/244489
Reason: Safe to say this is dead in the water and should finally be put out of its misery. artom: your turn.

Stephen Finucane (stephenfinucane) on 2018-10-16

Changed in nova:
assignee:	Nikola Đipanov (ndipanov) → Artom Lifshitz (notartom)
assignee:	Artom Lifshitz (notartom) → nobody
summary:	- migration/evacuation/resize/unshelve of instance with NUMA topology + cold migration/evacuation/resize/unshelve of instance with NUMA topology needs to recalculate NUMA topology
summary:	- cold migration/evacuation/resize/unshelve of instance with NUMA topology - needs to recalculate NUMA topology + Move operations need to recalculate NUMA topology
description:	updated

OpenStack Compute (nova)

Move operations need to recalculate NUMA topology

Bug Description

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches