OpenStack Compute (nova)

compute node update available resources should always update updated_at

Bug #1153778 reported by Chris Behrens on 2013-03-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Chris Behrens	OpenStack Compute (nova) 2013.1 "grizzly"

Bug Description

The scheduler's host_manager tries to cache capacity changes in between resource tracker updates on the compute side. It uses its cache while each compute_node['updated_at'] is older than the scheduler cache.

It's possible that under failure scenarios that this cache is updated to use a resource that is never committed on the compute side. It's also possible due to a race condition that the scheduler updates its cache at nearly the same time the compute manager does.

There's a periodic task that runs on compute nodes that constantly updates the compute_node's entry to make sure used resources are in sync with the DB.
However, when there's no differences, despite calling compute_node_update(), 'updated_at' does not get updated in the DB. This can result in the scheduler forever having an incorrect view of resources as it never sees a newer 'updated_at' in the compute_node entry.

See original description

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-03-11: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/24118

Changed in nova:
assignee:	nobody → Chris Behrens (cbehrens)
status:	New → In Progress

Chris Behrens (cbehrens) on 2013-03-12

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-03-12: Fix merged to nova (master)

Reviewed: https://review.openstack.org/24118
Committed: http://github.com/openstack/nova/commit/f398b9e195cda582bad57396b097dec274384c07
Submitter: Jenkins
Branch: master

commit f398b9e195cda582bad57396b097dec274384c07
Author: Chris Behrens <email address hidden>
Date: Mon Mar 11 14:02:30 2013 -0700

Force resource updates to update updated_at

When there's no changes in resources, compute_node_update (and other
DB update calls) won't modify 'updated_at'.

    'updated_at' is what is used to invalidate the cache in the scheduler's
    host_manager. Because of a race with the compute manager, the scheduler
    could be out of sync with the compute_nodes table but have a newer time
    on its cache.

By always updating 'updated_at' on resource updates, the periodic task
will be sure to invalidate any bad cache the scheduler has.

Fixes bug 1153778

Change-Id: I19b51a5b84f472cd0f4de6460a4edb540cc62da2

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-03-20

Changed in nova:
milestone:	none → grizzly-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2013-04-04

Changed in nova:
milestone:	grizzly-rc1 → 2013.1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-06-28:

Reviewed: https://review.openstack.org/33853
Committed: http://github.com/openstack/nova/commit/0ed62fb7affbda4a701c2175e95aa6f92038604c
Submitter: Jenkins
Branch: master

commit 0ed62fb7affbda4a701c2175e95aa6f92038604c
Author: Peter Feiner <email address hidden>
Date: Wed Jun 19 21:14:43 2013 +0000

db.compute_node_update: ignore values['update_at']

    When individual instances are updated (e.g., during spawn and
    terminate), ResourceTracker (in nova.compute.resource_tracker) calls
    compute_node_update with values=self.compute_node. Since
    self.compute_node is an instance of ComputeNode that was retrieved
    from the database, it has updated_at set. Since updated_at is in
    values, sqlalchemy doesn't automatically change the record's
    updated_at column (see
    nova.openstack.common.db.sqlalchemy.models.TimestampMixin). Moreover,
    since updated_at is set to the last value's updated_at, updated_at
    effectively doesn't change until values without updated_at are sent,
    which only happens during the periodic task that calls
    ResourceTracker.update_available_resources.

    Nova-scheduler relies on ComputeNode.updated_at to keep its model of
    available resources up-to-date. In particular, nova-scheduler doesn't
    play a role in instance termination, so it doesn't account for freed
    resources until ComputeNode.updated_at changes. Thus, between
    nova-compute's periodic calls to
    ResourceTracker.update_available_resources, nova-scheduler's model of
    available resources monotonically decreases. If, for example, a node
    has resources for 10 instances, and you manage to boot 10, terminate
    10, then attempt to boot another before the end of the period,
    nova-scheduler won't schedule the new instance on the vacant node.

Fixes bug #1194900.

Note that f398b9e195cda582bad57396b097dec274384c07 fixed a separate
issue (bug #1153778) related to ComputeNode.update_at being stale.

Change-Id: Ifd1e56edfd811241816970715071876857de80d3

Reviewed:  https://review.openstack.org/33853
Committed: http://github.com/openstack/nova/commit/0ed62fb7affbda4a701c2175e95aa6f92038604c
Submitter: Jenkins
Branch:    master

commit 0ed62fb7affbda4a701c2175e95aa6f92038604c
Author: Peter Feiner <peter@gridcentric.ca>
Date:   Wed Jun 19 21:14:43 2013 +0000

db.compute_node_update: ignore values['update_at']
    
    When individual instances are updated (e.g., during spawn and
    terminate), ResourceTracker (in nova.compute.resource_tracker)  calls
    compute_node_update with values=self.compute_node. Since
    self.compute_node is an instance of ComputeNode that was retrieved
    from the database, it has updated_at set. Since updated_at is in
    values, sqlalchemy doesn't automatically change the record's
    updated_at column (see
    nova.openstack.common.db.sqlalchemy.models.TimestampMixin). Moreover,
    since updated_at is set to the last value's updated_at, updated_at
    effectively doesn't change until values without updated_at are sent,
    which only happens during the periodic task that calls
    ResourceTracker.update_available_resources.
    
    Nova-scheduler relies on ComputeNode.updated_at to keep its model of
    available resources up-to-date. In particular, nova-scheduler doesn't
    play a role in instance termination, so it doesn't account for freed
    resources until ComputeNode.updated_at changes. Thus, between
    nova-compute's periodic calls to
    ResourceTracker.update_available_resources, nova-scheduler's model of
    available resources monotonically decreases. If, for example, a node
    has resources for 10 instances, and you manage to boot 10, terminate
    10, then attempt to boot another before the end of the period,
    nova-scheduler won't schedule the new instance on the vacant node.
    
    Fixes bug #1194900.
    
    Note that f398b9e195cda582bad57396b097dec274384c07 fixed a separate
    issue (bug #1153778) related to ComputeNode.update_at being stale.
    
    Change-Id: Ifd1e56edfd811241816970715071876857de80d3

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2013-12-10:

related bug 1194900

Changed in nova:
importance:	Undecided → High

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.