compute node update available resources should always update updated_at

Bug #1153778 reported by Chris Behrens
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Chris Behrens

Bug Description

The scheduler's host_manager tries to cache capacity changes in between resource tracker updates on the compute side. It uses its cache while each compute_node['updated_at'] is older than the scheduler cache.

It's possible that under failure scenarios that this cache is updated to use a resource that is never committed on the compute side. It's also possible due to a race condition that the scheduler updates its cache at nearly the same time the compute manager does.

There's a periodic task that runs on compute nodes that constantly updates the compute_node's entry to make sure used resources are in sync with the DB.
However, when there's no differences, despite calling compute_node_update(), 'updated_at' does not get updated in the DB. This can result in the scheduler forever having an incorrect view of resources as it never sees a newer 'updated_at' in the compute_node entry.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/24118

Changed in nova:
assignee: nobody → Chris Behrens (cbehrens)
status: New → In Progress
Chris Behrens (cbehrens)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/24118
Committed: http://github.com/openstack/nova/commit/f398b9e195cda582bad57396b097dec274384c07
Submitter: Jenkins
Branch: master

commit f398b9e195cda582bad57396b097dec274384c07
Author: Chris Behrens <email address hidden>
Date: Mon Mar 11 14:02:30 2013 -0700

    Force resource updates to update updated_at

    When there's no changes in resources, compute_node_update (and other
    DB update calls) won't modify 'updated_at'.

    'updated_at' is what is used to invalidate the cache in the scheduler's
    host_manager. Because of a race with the compute manager, the scheduler
    could be out of sync with the compute_nodes table but have a newer time
    on its cache.

    By always updating 'updated_at' on resource updates, the periodic task
    will be sure to invalidate any bad cache the scheduler has.

    Fixes bug 1153778

    Change-Id: I19b51a5b84f472cd0f4de6460a4edb540cc62da2

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/33853
Committed: http://github.com/openstack/nova/commit/0ed62fb7affbda4a701c2175e95aa6f92038604c
Submitter: Jenkins
Branch: master

commit 0ed62fb7affbda4a701c2175e95aa6f92038604c
Author: Peter Feiner <email address hidden>
Date: Wed Jun 19 21:14:43 2013 +0000

    db.compute_node_update: ignore values['update_at']

    When individual instances are updated (e.g., during spawn and
    terminate), ResourceTracker (in nova.compute.resource_tracker) calls
    compute_node_update with values=self.compute_node. Since
    self.compute_node is an instance of ComputeNode that was retrieved
    from the database, it has updated_at set. Since updated_at is in
    values, sqlalchemy doesn't automatically change the record's
    updated_at column (see
    nova.openstack.common.db.sqlalchemy.models.TimestampMixin). Moreover,
    since updated_at is set to the last value's updated_at, updated_at
    effectively doesn't change until values without updated_at are sent,
    which only happens during the periodic task that calls
    ResourceTracker.update_available_resources.

    Nova-scheduler relies on ComputeNode.updated_at to keep its model of
    available resources up-to-date. In particular, nova-scheduler doesn't
    play a role in instance termination, so it doesn't account for freed
    resources until ComputeNode.updated_at changes. Thus, between
    nova-compute's periodic calls to
    ResourceTracker.update_available_resources, nova-scheduler's model of
    available resources monotonically decreases. If, for example, a node
    has resources for 10 instances, and you manage to boot 10, terminate
    10, then attempt to boot another before the end of the period,
    nova-scheduler won't schedule the new instance on the vacant node.

    Fixes bug #1194900.

    Note that f398b9e195cda582bad57396b097dec274384c07 fixed a separate
    issue (bug #1153778) related to ComputeNode.update_at being stale.

    Change-Id: Ifd1e56edfd811241816970715071876857de80d3

Revision history for this message
Vish Ishaya (vishvananda) wrote :

related bug 1194900

Changed in nova:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.