OpenStack Compute (nova)

simultaneous boot of multiple instances leads to cpu pinning overlap

Bug #1454451 reported by Chris Friesen on 2015-05-12

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Chris Friesen	OpenStack Compute (nova) 12.0.0 "liberty"
Nominated for Juno by Nikola Đipanov
	Kilo	Fix Released	Undecided	Nikola Đipanov	OpenStack Compute (nova) 2015.1.3

Bug Description

I'm running into an issue with kilo-3 that I think is present in current trunk. Basically it results in multiple instances (with dedicated cpus) being pinned to the same physical cpus.

I think there is a race between the claimed CPUs of an instance being persisted to the DB, and the resource audit scanning the DB for instances and subtracting pinned CPUs from the list of available CPUs.

The problem only shows up when the following sequence happens:
1) instance A (with dedicated cpus) boots on a compute node
2) resource audit runs on that compute node
3) instance B (with dedicated cpus) boots on the same compute node

So you need to either be booting many instances, or limiting the valid compute nodes (host aggregate or server groups), or have a small cluster in order to hit this.

The nitty-gritty view looks like this:

When booting up an instance we hold the COMPUTE_RESOURCE_SEMAPHORE in compute.resource_tracker.ResourceTracker.instance_claim() and that covers updating the resource usage on the compute node. But we don't persist the instance numa topology to the database until after instance_claim() returns, in compute.manager.ComputeManager._build_instance(). Note that this is done *after* we've given up the semaphore, so there is no longer any sort of ordering guarantee.

compute.resource_tracker.ResourceTracker.update_available_resource() then aquires COMPUTE_RESOURCE_SEMAPHORE, queries the database for a list of instances and uses that to calculate a new view of what resources are available. If the numa topology of the most recent instance hasn't been persisted yet, then the new view of resources won't include any pCPUs pinned by that instance.

compute.manager.ComputeManager._build_instance() runs for the next instance and based on the new view of available resources it allocates the same pCPU(s) used by the earlier instance. Boom, overlapping pinned pCPUs.

Lastly, the same bug applies to the compute.manager.ComputeManager.rebuild_instance() case. It uses the same pattern of doing the claim and then updating the instance numa topology after releasing the semaphore.

See original description

Tags:

Chris Friesen (cbf123) on 2015-05-12

Changed in nova:
assignee:	nobody → Chris Friesen (cbf123)
description:	updated

Chris Friesen (cbf123) on 2015-05-12

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-13: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/182766

Changed in nova:
status:	New → In Progress

OpenStack Infra (hudson-openstack) on 2015-05-13

Changed in nova:
assignee:	Chris Friesen (cbf123) → Dan Smith (danms)

OpenStack Infra (hudson-openstack) on 2015-05-14

Changed in nova:
assignee:	Dan Smith (danms) → Chris Friesen (cbf123)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-15: Fix merged to nova (master)

Reviewed: https://review.openstack.org/182766
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2427d288bc017a5b91430ffe16419d47703d2060
Submitter: Jenkins
Branch: master

commit 2427d288bc017a5b91430ffe16419d47703d2060
Author: Chris Friesen <email address hidden>
Date: Wed May 13 11:15:25 2015 -0600

Fix race between resource audit and cpu pinning

    This fixes a race between the claimed CPUs of an instance being
    persisted to the DB, and the resource audit scanning the DB for
    instances and subtracting pinned CPUs from the list of available CPUs.

    The problem only shows up when the following sequence happens:
    1) instance A (with dedicated cpus) boots on a compute node
    2) resource audit runs on that compute node
    3) instance B (with dedicated cpus) boots on the same compute node

    The bug is that the claimed numa topology isn't updated until
    after we release COMPUTE_RESOURCES_SEMAPHORE, so when the resource
    audit retrieves the list of instances the numa_topology hasn't
    been updated yet for the most recent one.

The fix is to persist the claimed numa topology before releasing
COMPUTE_RESOURCES_SEMAPHORE.

    Closes-Bug: #1454451
    Co-Authored-By: Dan Smith <email address hidden>
    Change-Id: I553f2e43a68577c83d890c3671380af68f9e725a

Changed in nova:
status:	In Progress → Fix Committed

Nikola Đipanov (ndipanov) on 2015-05-26

Changed in nova:
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-26: Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/185591

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-26: Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/185647

Thierry Carrez (ttx) on 2015-06-24

Changed in nova:
milestone:	none → liberty-1
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-21: Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/185591
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b13726bcf6b9e6a006ec9bfcde051331741ded5a
Submitter: Jenkins
Branch: stable/kilo

commit b13726bcf6b9e6a006ec9bfcde051331741ded5a
Author: Chris Friesen <email address hidden>
Date: Wed May 13 11:15:25 2015 -0600

Fix race between resource audit and cpu pinning

    The problem only shows up when the following sequence happens:
    1) instance A (with dedicated cpus) boots on a compute node
    2) resource audit runs on that compute node
    3) instance B (with dedicated cpus) boots on the same compute node

The fix is to persist the claimed numa topology before releasing
COMPUTE_RESOURCES_SEMAPHORE.

    Closes-Bug: #1454451
    Co-Authored-By: Dan Smith <email address hidden>
    (cherry picked from commit 2427d288bc017a5b91430ffe16419d47703d2060)

    Conflicts:
     nova/compute/manager.py
     nova/tests/unit/compute/test_resource_tracker.py
     nova/tests/unit/compute/test_tracker.py

Change-Id: I553f2e43a68577c83d890c3671380af68f9e725a

Thierry Carrez (ttx) on 2015-10-15

Changed in nova:
milestone:	liberty-1 → 12.0.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-02: Change abandoned on nova (stable/juno)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/185647
Reason: Juno is EOL soon.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.