Instance's numa_topology shouldn't be changed in NUMATopologyFilter

Bug #1405359 reported by Liusheng
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Nikola Đipanov

Bug Description

In change https://review.openstack.org/#/c/133998, the instance['numa_topology'] will be set when filter successfully. when we have many hosts in environment, the instance['numa_topology'] will be set every time when filter host successfully, and it will be the numa_topology that base on last fitting successfully host's numa_topology. But the instance may will don't boot on the last filtered host after weighting and random selecting. That may lead booting failed because the numa_topology of "last filtered host" may be different with the chosen host's.

Tags: scheduler
Liusheng (liusheng)
Changed in nova:
assignee: nobody → Liusheng (liusheng)
Rui Chen (kiwik-chenrui)
Changed in nova:
assignee: Liusheng (liusheng) → Rui Chen (kiwik-chenrui)
Revision history for this message
Rui Chen (kiwik-chenrui) wrote :

Look like it make selected host consume the instance's numa_topology calculated according to the other fulfill host.

Rui Chen (kiwik-chenrui)
tags: added: scheduler
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/149943

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
Revision history for this message
Nikola Đipanov (ndipanov) wrote :

So while not perfect, this is not a problem in practice, on the compute node, as Claim class will always re-calculate the NUMA topology based on the instance bits so setting it makes no difference for the compute. See:

https://github.com/openstack/nova/blob/2176ba7881e4ccae107bb6e614f8854b87f60a65/nova/compute/manager.py#L2175

Also in filters - we set it on the instance_dict that is part of the request_spec - this never even makes it to the compute nodes.

That said - this bug does make consume_from_instance() account for the potentially wrong topology (it subtracts the one calculated from the last potential host which may or may not be the one that was chose and that we are consuming it from). This in turn can cause requests for multiple instances with NUMA to fail and hit retry more than they need to.

So it is definitely a bug, just it is limited to exibiting itself only for multiple requests.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/156930

Changed in nova:
assignee: Rui Chen (kiwik-chenrui) → Nikola Đipanov (ndipanov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Rui Chen (<email address hidden>) on branch: master
Review: https://review.openstack.org/149943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/156930
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c206d162febe7f12d84a072bda8c33fd408a343b
Submitter: Jenkins
Branch: master

commit c206d162febe7f12d84a072bda8c33fd408a343b
Author: Nikola Dipanov <email address hidden>
Date: Wed Feb 18 10:29:37 2015 +0100

    Set instance NUMA topology on HostState

    NUMATopologyFilter will try to fit an instance onto every host
    (represented by a HostState instance) so assigning the resulting
    instance topology to the instance dict really makes no sense, as we end
    up with only the last calculated topology from all the filter runs.

    This in turn makes consume_from_instance not work as expected, as it
    will consume NUMA topology calculated from the last host the filter was
    run on, not the host that was chosen by the scheduler.

    This patch stashes the calculated NUMA topology onto the HostState
    instance passed to the filter, that will be used in
    consume_from_instance, and makes sure that is what gets used for
    updating the usage.

    Change-Id: Ifacccadf73dc114e50f46b8e6087ffb2b2fc9d6b
    Closes-Bug: #1405359

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-3 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.