migration/evacuation/rebuild/resize of instance with NUMA topology needs to recalculate NUMA topology

Bug #1417667 reported by Chris Friesen on 2015-02-03
120
This bug affects 21 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Andrey Volkov

Bug Description

I'm running nova trunk, commit 752954a.

I configured a flavor with two vcpus and extra specs "hw:cpu_policy=dedicated" in order to enable vcpu pinning.

I booted up a number of instances such that there was one instance affined to host cpus 12 and 13 on compute-0, and another instance affined to cpus 12 and 13 on compute-2. (As reported by "virsh vcpupin" and "virsh dumpxml".)

I then triggered a live migration of one instance from compute-0 to compute-2. This resulted in both instances being affined to host cpus 12 and 13 on compute-2.

The "hw:cpu_policy=dedicated" extra spec is intended to provide dedicated host cpus for the instance. In order to provide this, on a live migration (or cold migration, or rebuild, or evacuation, or resize, etc.) nova needs to ensure that the instance is affined to host cpus that are not currently being used by other instances.

Chris Friesen (cbf123) wrote :

On a related note, I just did a resize of an instance from a flavor with dedicated CPUs, 2 cpus and no specified numa information to a flavor with dedicated CPUs, 2 cpus and two NUMA nodes.

As part of the resize this ended up triggering a switch to another compute node. Also, the XML file for the instance being started up did not reflect the NUMA settings for the new flavor.

Lastly, as per above the host cpus were not re-evaluated on the new compute node and so ended up using host cpus that were already in use by another instance with dedicated CPUs.

melanie witt (melwitt) on 2015-02-13
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Nikola Đipanov (ndipanov) wrote :

Marking this one as duplicate as the other bug is slightly more generic than this one as this happens for any NUMA instance regardless of weather 1-1 CPU pinning was requested.

Chris Friesen (cbf123) wrote :

What about recalculating CPUs during live migration? That doesn't go through the resize path so I'm not sure that 1370390 would apply.

Chris Friesen (cbf123) wrote :

It's definitely true that bug #1370390 would cover the resize case that I mentioned in note #1. I think that we still need to keep this bug open for the more general case of live-migration, evacuate, rebuild, etc.

Chris Friesen (cbf123) on 2015-02-26
Changed in nova:
assignee: nobody → Chris Friesen (cbf123)

Thanks!

Ian

> On Feb 26, 2015, at 10:01 AM, Chris Friesen <email address hidden> wrote:
>
> ** Changed in: nova
> Assignee: (unassigned) => Chris Friesen (cbf123)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1417667
>
> Title:
> migration/evacuation/rebuild/resize of instance with dedicated cpus
> needs to recalculate cpus on destination
>
> Status in OpenStack Compute (Nova):
> Confirmed
>
> Bug description:
> I'm running nova trunk, commit 752954a.
>
> I configured a flavor with two vcpus and extra specs
> "hw:cpu_policy=dedicated" in order to enable vcpu pinning.
>
> I booted up a number of instances such that there was one instance
> affined to host cpus 12 and 13 on compute-0, and another instance
> affined to cpus 12 and 13 on compute-2. (As reported by "virsh
> vcpupin" and "virsh dumpxml".)
>
> I then triggered a live migration of one instance from compute-0 to
> compute-2. This resulted in both instances being affined to host cpus
> 12 and 13 on compute-2.
>
> The "hw:cpu_policy=dedicated" extra spec is intended to provide
> dedicated host cpus for the instance. In order to provide this, on a
> live migration (or cold migration, or rebuild, or evacuation, or
> resize, etc.) nova needs to ensure that the instance is affined to
> host cpus that are not currently being used by other instances.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1417667/+subscriptions

Changed in nova:
assignee: Chris Friesen (cbf123) → Bart Wensley (bartwensley)

Fix proposed to branch: master
Review: https://review.openstack.org/163440

Changed in nova:
status: Confirmed → In Progress

For reference, here are some details about the approach I am taking - I sent this to the openstack-dev list a few weeks back...

I have been exercising the numa topology related features in kilo (cpu pinning, numa topology, huge pages) and have seen that there are issues when an operation moves an instance between compute nodes. In summary, the numa_topology is not recalculated for the destination node, which results in the instance running with the wrong topology (or even failing to run if the topology isn't supported on the destination). This impacts live migration, cold migration, resize and evacuate.

I have spent some time over the last couple weeks and have a working fix for these issues that I would like to push upstream. The fix for cold migration and resize is the most straightforward, so I plan to start there.

At a high level, here is what I have done to fix cold migrate and resize:
- Add the source_numa_topology and dest_numa_topology to the migration
  object and migrations table.
- When a resize_claim is done, store the claimed numa topology in the
  dest_numa_topology in the migration record. Also store the current
  numa topology as the source_numa_topology in the migration record.
- Use the source_numa_topology and dest_numa_topology from the
  migration record in the resource accounting when referencing
  migration claims as appropriate. This is done for claims, dropped
  claims and the resource audit.
- Set the numa_topology in the instance after the cold migration/resize
  is finished to the dest_numa_topology from the migration object -
  done in finish_resize RPC on the destination compute to match where
  the rest of the resources for the instance are updated (there is a
  call to _set_instance_info here that sets the memory, vcpus, disk
  space, etc... for the migrated instance).
- Set the numa_topology in the instance if the cold migration/resize is
  reverted to the source_numa_topology from the migration object -
  done in finish_revert_resize RPC on the source compute.

I would appreciate any comments on my approach. I plan to start submitting the code for this against bug 1417667 - I will split it into several chunks to make it easier to review.

Fixing live migration was significantly more effort - I'll start a different thread on that once I have feedback on the above approach.

tags: added: live-migrate

Change abandoned by Bart Wensley (<email address hidden>) on branch: master
Review: https://review.openstack.org/163440
Reason: Based on review comments and IRC discussions, a different approach is required. The new approach will use the new resource objects described here: https://blueprints.launchpad.net/nova/+spec/make-resource-tracker-use-objects

Fix proposed to branch: master
Review: https://review.openstack.org/218385

Changed in nova:
assignee: Bart Wensley (bartwensley) → Nikola Đipanov (ndipanov)

Fix proposed to branch: master
Review: https://review.openstack.org/218500

Fix proposed to branch: master
Review: https://review.openstack.org/218938

tags: added: liberty-rc-potential

Reviewed: https://review.openstack.org/218385
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c458921729821f3d2aad5a9b76fbc1f80dcd8962
Submitter: Jenkins
Branch: master

commit c458921729821f3d2aad5a9b76fbc1f80dcd8962
Author: Nikola Dipanov <email address hidden>
Date: Fri Aug 28 16:14:05 2015 +0100

    Claims: Make sure move claims create a migration context records

    This way we will be able to keep track of the resources that were
    actually claimed for those resources (such as NUMA topology) which are
    not just a simple sum but actually have state.

    This puts in the infra for tracking the claimed resources, however 2
    additional steps are needed to fully use this and make resource tracking
    work for all instances when migrating:

    1) We don't use the resources claimed and stashed in the Migration
    context to do resource tracking. This will be a follow-on fix in the
    resource tracker.

    2) We still don't properly assign the newly claimed topology to the
    instance once the migration is confirmed - this will be done in a follow
    up patch that builds on this work.

    Change-Id: Ib0532f0cbebc5568d9b94b74b7a1555fcf871df5
    Partial-bug: #1417667
    Related-blueprint: migration-fix-resource-tracking

Reviewed: https://review.openstack.org/218500
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a298455398eb71f81a65782266450dcec444e56b
Submitter: Jenkins
Branch: master

commit a298455398eb71f81a65782266450dcec444e56b
Author: Nikola Dipanov <email address hidden>
Date: Fri Aug 28 23:23:02 2015 +0100

    compute: migrate/resize paths properly handle stashed numa_topology

    This patch makes the migrate and resize code paths in the compute
    service take advantage of the now stashed data in the context_migration
    field of the instance to apply the claimed NUMA topology to the instance
    and provision it (and to potentially revert back to the old one on the
    source host)

    Change-Id: Ib91f211e87c1770c1997b3b8ff01d55092c896bf
    Partial-bug: #1417667
    Related-blueprint: migration-fix-resource-tracking

Reviewed: https://review.openstack.org/218938
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7fd3032f39cc6cc9b8e87834aa037bc4fb5e5620
Submitter: Jenkins
Branch: master

commit 7fd3032f39cc6cc9b8e87834aa037bc4fb5e5620
Author: Nikola Dipanov <email address hidden>
Date: Mon Aug 31 17:14:36 2015 +0100

    RT: Migration resource tracking uses migration context

    Now that we claim and stash the claimed resources (currently only NUMA
    topology is tracked this way).

    This patch fixes the migration of instances with NUMA requirements as
    they will now be properly tracked and subsequent boot/migrate requests
    will see a consistent view of resources.

    Change-Id: I16137ed3a34622e1edaa61f1793fe38ea00ec251
    Partial-bug: #1417667
    Related-blueprint: migration-fix-resource-tracking

adding to RC1 because of the RPC change that would mean we can't backport the fix.

Changed in nova:
milestone: none → liberty-rc1
Michael Still (mikal) wrote :

The patches in comments 12 onwards are all merged. Is there anything else to do here?

John Garbutt (johngarbutt) wrote :

This is the remaining patch in the series:
https://review.openstack.org/#/c/200485/

Nikola Đipanov (ndipanov) wrote :

In addition those patches only solve some of this for resize and evacuate case. Live migration will still need work in Mitaka

Chris Friesen (cbf123) wrote :

I haven't looked at the full set of patches yet...do they also handle reallocation of PCI/SRIOV devices over a resize/evacuate?

Reviewed: https://review.openstack.org/200485
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dc0221d7240326a2d1b467e2a367bebb7e764e61
Submitter: Jenkins
Branch: master

commit dc0221d7240326a2d1b467e2a367bebb7e764e61
Author: Nikola Dipanov <email address hidden>
Date: Wed Aug 19 17:55:49 2015 +0100

    rebuild: RPC sends additional args and claims are done

    If a migration record was created for an evacuation operation, we will
    want to pass it over to the compute node. It will be required for
    resource tracking, and we can avoid a round trip back to the conductor
    by making it a part of the RPC call.

    Also - if we called the scheduler, we also know the node of the chosen
    host, and the limits set, and this information is needed for doing claims.

    We also make sure that rebuild now uses claims and resource tracking.

    There is a final piece of the puzzle missing, and that is the following
    patch which actually makes resource tracker update it's resources based
    on the evacuation records, and adds the relevant tests.

    Related-bug: #1417667
    Change-Id: I0233f964d8f294f0ffd9edcb16b1aaf93486177f

removing RC1 milestone, as we have merged what we need for RC1

Changed in nova:
milestone: liberty-rc1 → none

Reviewed: https://review.openstack.org/226411
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4ee4f9f2ec639d3361d65dd2187f5bde48be9970
Submitter: Jenkins
Branch: master

commit 4ee4f9f2ec639d3361d65dd2187f5bde48be9970
Author: Nikola Dipanov <email address hidden>
Date: Tue Sep 22 16:58:20 2015 +0100

    RT: track evacuation migrations

    The previous patch sets the stage for tracking resources when doing
    evacuate, but we still have to make sure that
    _update_usage_from_migration actually does the update of the resources.
    To do that we have to make sure we are getting the right flavor in case
    of 'evacuation' migration, and also that we are not completely skipping
    over it.

    Once that's done claims and resource tracking work as expected when
    evacuating instances.

    DocImpact

    Change-Id: Ie74939e543155bc42705b28e1b44d943ef54ebdc
    Related-bug: #1417667

Matt Riedemann (mriedem) on 2015-09-23
tags: removed: liberty-rc-potential
Paul Murray (pmurray) on 2015-11-06
tags: added: live-migration
removed: live-migrate

Fix proposed to branch: master
Review: https://review.openstack.org/244489

Reviewed: https://review.openstack.org/244484
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0bb21d5173bc0f2d4a3a9455e54a50befff7b581
Submitter: Jenkins
Branch: master

commit 0bb21d5173bc0f2d4a3a9455e54a50befff7b581
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 14:51:39 2015 +0100

    compute: split check_can_live_migrate_destination

    We'll want to add claims around the logic of this method, so it's better
    if there is a nested method that does the business logic to
    avoid deep nesting that makes the code less readable.

    Related-bug: 1417667
    Change-Id: I8c2f1a67846fe27dbe350a288d7e9be6be109b90

Reviewed: https://review.openstack.org/244485
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f73fe579a6b702450bc2b2d81f2813d3ef4c39eb
Submitter: Jenkins
Branch: master

commit f73fe579a6b702450bc2b2d81f2813d3ef4c39eb
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 16:36:49 2015 +0100

    live-migrate: Change the status Migration is created with

    We want to be consistent with what resize does here, and there should be
    no difference with regards to resource tracking (i.e. resources now
    should not be tracked for this state).

    Partial-bug: 1417667
    Change-Id: I75f914fb28e337df48e8e68ae7e407b5239d2647

Reviewed: https://review.openstack.org/244486
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2c1f8e6002850a38c82739839be7245b786c9c0c
Submitter: Jenkins
Branch: master

commit 2c1f8e6002850a38c82739839be7245b786c9c0c
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 17:18:56 2015 +0100

    Don't track migrations in 'accepted' state

    This state was added during Liberty for evacuate, and since it's the
    first state that is assigned to the migration record on it's creation,
    and before we start using it for resource tracking, there is no reason
    to consider those records as "in progress".

    Change-Id: I2a9bbe87bc1518cee9e43e4eeec1fa585305f423
    Related-bug: 1417667

Reviewed: https://review.openstack.org/244488
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a8ba1d69dffa6943ca3cbf9e16c19191982c0a6
Submitter: Jenkins
Branch: master

commit 5a8ba1d69dffa6943ca3cbf9e16c19191982c0a6
Author: Nikola Dipanov <email address hidden>
Date: Thu Nov 12 05:55:28 2015 +0000

    live-mig: Mark migration as failed on fail to schedule

    In case we reach maximum retires to schedule - we want to mark the
    migration record as failed, since the whole operation will be aborted at
    that point.

    Change-Id: I3528e98b2dc596995556aec83bb65104b671571d
    Partial-bug: #1417667

Reviewed: https://review.openstack.org/304015
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fa3fb5e61f1648bc94b426451136da763b1cc73f
Submitter: Jenkins
Branch: stable/liberty

commit fa3fb5e61f1648bc94b426451136da763b1cc73f
Author: Nikola Dipanov <email address hidden>
Date: Fri Oct 16 17:18:56 2015 +0100

    Don't track migrations in 'accepted' state

    This state was added during Liberty for evacuate, and since it's the
    first state that is assigned to the migration record on it's creation,
    and before we start using it for resource tracking, there is no reason
    to consider those records as "in progress".

    Change-Id: I2a9bbe87bc1518cee9e43e4eeec1fa585305f423
    Related-bug: 1417667
    (cherry picked from commit 2c1f8e6002850a38c82739839be7245b786c9c0c)

tags: added: in-stable-liberty
Changed in nova:
assignee: Nikola Đipanov (ndipanov) → Sylvain Bauza (sylvain-bauza)
Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)

Change abandoned by Daniel Berrange (<email address hidden>) on branch: master
Review: https://review.openstack.org/286742
Reason: Abadoning since its obsolet & nikola no longer works on nova

Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Sylvain Bauza (sylvain-bauza)
Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Stephen Finucane (stephenfinucane)
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → Pawel Koniszewski (pawel-koniszewski)
Changed in nova:
assignee: Pawel Koniszewski (pawel-koniszewski) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Pawel Koniszewski (pawel-koniszewski)

The overall NUMA topology is affected by this issues including huge pages

summary: - migration/evacuation/rebuild/resize of instance with dedicated cpus
- needs to recalculate cpus on destination
+ migration/evacuation/rebuild/resize of instance with NUMA topology needs
+ to recalculate NUMA topology
Changed in nova:
assignee: Pawel Koniszewski (pawel-koniszewski) → Andrey Volkov (avolkov)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers