NUMA scheduling will not attempt to pack an instance onto a host

Bug #1386236 reported by Andrew Theurer
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Nikola Đipanov
Juno
Fix Released
High
Nikola Đipanov

Bug Description

When creating a flavor which includes "hw:numa_nodes": "1", all instances booted with this flavor are always pinned to NUMA node0. Multiple instances end up on node0 and no instances are on node1. Our expectation was that instances would be balanced across NUMA nodes.

To recreate:

1) Ensure you have a compute node with at least 2 sockets
2) Create a flavor with vcpus and memory which fits within one socket
3) Add the flavor key: nova flavor-key <flavor> set hw:numa_nodes=1
4) Boot more than 1 instances
5) Verify where the vcpus are pinned

Revision history for this message
Daniel Berrange (berrange) wrote :

The current NUMA code in Juno has a mistakenly limited bit of logic whereby guest NUMA node N is *always* placed on host NUMA node N. Talking with Nikola, this should be fairly straightforward to rectify and he indicates he'll fix this while working on the CPU pinning work.

Changed in nova:
assignee: nobody → Nikola Đipanov (ndipanov)
Joe Gordon (jogo)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

Giving a bit more details here - the fix will consist of 3 parts:

1) Add the logic that will try to pack the instances onto cores. A similar logic is already added to the (currently under review so likely to change) CPU Pinning patches - namely this https://review.openstack.org/#/c/129266/11/nova/virt/hardware.py

The proposal is to add the 'get_pinning_for_instance' method to the VirtNUMAHostTopology, that will go throught all the available permutations of it's own cells of the length of instance cells and find one that fits, it will then assign those IDs to the instance cells (this is what we would just default to ascending order form 0, and is causing this bug)

2) Make sure that the scheduler is using this data in consume_from_instance

3) Make sure that the compute service once it recalculates it on the boot requests saves it in the DB.

summary: - NUMA scheduling broken
+ NUMA scheduling will not attempt to pack an instance onto a host
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/133946

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/133998

Changed in nova:
assignee: Nikola Đipanov (ndipanov) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Nikola Đipanov (ndipanov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/135403

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/133946
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d13205fb6036a6c7d66de350cb226dd0f9ee12d9
Submitter: Jenkins
Branch: master

commit d13205fb6036a6c7d66de350cb226dd0f9ee12d9
Author: Nikola Dipanov <email address hidden>
Date: Wed Nov 12 13:43:27 2014 +0100

    Add support for fitting instance NUMA nodes onto a host

    This commit adds the methods needed to enable fitting instances onto
    NUMA nodes. It adds fit_instance_to_host() method to the
    VirtNUMAHostTopology class that will do the fitting returning the
    instance topology with it's cells assigned to the cells of a given host.

    This method will be used in the scheduler and claims and will obsolete
    the need for claim_test method which will be removed in subsequent
    commits.

    It is worth noting that after we transition filter and claims to use the
    methods added to this patch - it will no longer be possible for an
    NUMA-aware instance to be over-committed against itself no matter what the
    over-subscription ratios are.

    Change-Id: I5fb6814778c2790cdd8892f756a33763b8f4a712
    Partial-bug: #1386236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/135403
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a59e1a9c7e54efaadc39d366772972463855dfc7
Submitter: Jenkins
Branch: master

commit a59e1a9c7e54efaadc39d366772972463855dfc7
Author: Nikola Dipanov <email address hidden>
Date: Tue Nov 18 20:45:28 2014 +0100

    Make Instance.save() update numa_topology

    This is needed so that we can actually update the given topology with
    the updated data after a successful claim.

    Deleting it will also be needed when we actually make the resize work
    properly for instances with NUMA topology, so we add it here as well.

    We do not expose the new InstanceNUMATopology methods as @remotable to
    avoid having to bump the object version thus making this an easier
    backport target. This is OK since they are only called from
    Instance.save() which is @remotable, and can be trivially made remotable
    should this be needed later (causing a version bump that need not be
    backported).

    Change-Id: I64ff2d00ca20bd065bb17ebaa9c40b64b8cbb817
    Partial-bug: #1386236

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/133998
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53099f3bf23d0d160fc690a90cf4f32506adf076
Submitter: Jenkins
Branch: master

commit 53099f3bf23d0d160fc690a90cf4f32506adf076
Author: Nikola Dipanov <email address hidden>
Date: Wed Nov 12 17:14:01 2014 +0100

    Instances with NUMA will be packed onto hosts

    This patch makes the NUMATopologyFilter and instance claims on the
    compute host use instance fitting logic to allow for actually packing
    instances onto NUMA capable hosts.

    This also means that the NUMA placement that is calculated during a
    successfull claim will need to be updated in the database to reflect the
    host NUMA cell ids the instance cells will be pinned to.

    Using fit_instance_to_host() to decide weather an instance can land
    on a host makes the NUMATopologyFilter code cleaner as it now fully
    re-uses all the logic in VirtNUMAHostTopology and
    VirtNUMATopologyCellUsage classes.

    Change-Id: Ieabafea73b4d566f4194ca60be38b6415d8a8f3d
    Closes-bug: #1386236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/137683

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/137685

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/137686

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/137683
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=27d071f44f080d50ac291de2cb9385934b400ccd
Submitter: Jenkins
Branch: stable/juno

commit 27d071f44f080d50ac291de2cb9385934b400ccd
Author: Nikola Dipanov <email address hidden>
Date: Wed Nov 12 13:43:27 2014 +0100

    Add support for fitting instance NUMA nodes onto a host

    This commit adds the methods needed to enable fitting instances onto
    NUMA nodes. It adds fit_instance_to_host() method to the
    VirtNUMAHostTopology class that will do the fitting returning the
    instance topology with it's cells assigned to the cells of a given host.

    This method will be used in the scheduler and claims and will obsolete
    the need for claim_test method which will be removed in subsequent
    commits.

    It is worth noting that after we transition filter and claims to use the
    methods added to this patch - it will no longer be possible for an
    NUMA-aware instance to be over-committed against itself no matter what the
    over-subscription ratios are.

    Partial-bug: #1386236
    (cherry picked from commit d13205fb6036a6c7d66de350cb226dd0f9ee12d9)

    Conflicts:
     nova/tests/unit/virt/test_hardware.py

    Change-Id: I5fb6814778c2790cdd8892f756a33763b8f4a712

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/137685
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ccb7ef2b017edd1d192b597310c0688e690a9175
Submitter: Jenkins
Branch: stable/juno

commit ccb7ef2b017edd1d192b597310c0688e690a9175
Author: Nikola Dipanov <email address hidden>
Date: Tue Nov 18 20:45:28 2014 +0100

    Make Instance.save() update numa_topology

    This is needed so that we can actually update the given topology with
    the updated data after a successful claim.

    Deleting it will also be needed when we actually make the resize work
    properly for instances with NUMA topology, so we add it here as well.

    We do not expose the new InstanceNUMATopology methods as @remotable to
    avoid having to bump the object version thus making this an easier
    backport target. This is OK since they are only called from
    Instance.save() which is @remotable, and can be trivially made remotable
    should this be needed later (causing a version bump that need not be
    backported).

    Partial-bug: #1386236
    (cherry picked from commit a59e1a9c7e54efaadc39d366772972463855dfc7)

    Conflicts:
     nova/tests/unit/objects/test_instance.py

    Change-Id: I64ff2d00ca20bd065bb17ebaa9c40b64b8cbb817

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/137686
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ee00c8015ca2c71095ffd87c190a47f22c4f73fb
Submitter: Jenkins
Branch: stable/juno

commit ee00c8015ca2c71095ffd87c190a47f22c4f73fb
Author: Nikola Dipanov <email address hidden>
Date: Wed Nov 12 17:14:01 2014 +0100

    Instances with NUMA will be packed onto hosts

    This patch makes the NUMATopologyFilter and instance claims on the
    compute host use instance fitting logic to allow for actually packing
    instances onto NUMA capable hosts.

    This also means that the NUMA placement that is calculated during a
    successfull claim will need to be updated in the database to reflect the
    host NUMA cell ids the instance cells will be pinned to.

    Using fit_instance_to_host() to decide weather an instance can land
    on a host makes the NUMATopologyFilter code cleaner as it now fully
    re-uses all the logic in VirtNUMAHostTopology and
    VirtNUMATopologyCellUsage classes.

    Closes-bug: #1386236
    (cherry picked from commit 53099f3bf23d0d160fc690a90cf4f32506adf076)

    Conflicts:
     nova/compute/manager.py
     nova/tests/unit/compute/test_claims.py
     nova/tests/unit/compute/test_resource_tracker.py
     nova/virt/hardware.py

    Change-Id: Ieabafea73b4d566f4194ca60be38b6415d8a8f3d

Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.