Ironic

Nova scheduler not updated immediately when a baremetal node is added or removed

Bug #1248022 reported by Mark McLoughlin on 2013-11-05

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	Ironic	Opinion	Wishlist	Unassigned
	OpenStack Compute (nova)	Opinion	Medium	Unassigned

Bug Description

With the Ironic driver, if a baremetal node is added/deleted, it is not removed from pool of available resources until the next run of update_available_resource(). During this window, the scheduler may continue to attempt to schedule instances on it (when deleted), or report NoValidHosts (when added) leading to unnecessary failures and scheduling retries.

In compute manager, the update_available_resource() periodic task is responsible for updating the scheduler's knowledge of baremetal nodes:

    @periodic_task.periodic_task
    def update_available_resource(self, context):
        ...
        nodenames = set(self.driver.get_available_nodes())
        for nodename in nodenames:
            rt = self._get_resource_tracker(nodename)
            rt.update_available_resource(context)

update_available_resource() is also called at service startup

This means that you have to wait up to 60 seconds for a node to become available/no longer available.

See original description

Tags:

Revision history for this message

Mark McLoughlin (markmc) wrote on 2013-11-05:

Evidence of this issue here:

https://github.com/openstack/tripleo-incubator/blob/e22a2b3/scripts/register-nodes#L27

echo "Nodes will be available in 60 seconds from now."

tags:	added: baremetal
Changed in nova:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

aeva black (tenbrae) wrote on 2014-07-22:

Confirming that this also affects Ironic.

Changed in ironic:
status:	New → Triaged
importance:	Undecided → Medium
tags:	added: nova-driver

Revision history for this message

Dmitry Tantsur (divius) wrote on 2014-09-05:

Badly affects TripleO UI, would like it to be fixed in J

Changed in ironic:
milestone:	none → juno-rc1

Revision history for this message

Dmitry Tantsur (divius) wrote on 2014-09-11:

Merged with duplicating bug. Also reposting a comment:

Devananda van der Veen (devananda) wrote on 2014-03-22: #1

I suspect that this is not solvable in the current nova-scheduler architecture, and will require a mechanism for Ironic to actively inform Nova upon resource availability changes (rather than passively wait for Nova to request a list of available resources).

summary:	- Nova scheduler not updated immediately when a baremetal node is added + Nova scheduler not updated immediately when a baremetal node is added or + removed
description:	updated
tags:	added: ironic removed: baremetal

aeva black (tenbrae) on 2014-09-13

Changed in ironic:
milestone:	juno-rc1 → none

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2015-03-23:

Does this still need to be marked as "High"?

Changed in nova:
importance:	High → Medium

Revision history for this message

Sean Dague (sdague) wrote on 2015-03-30:

This is a much bigger architecture issue to address.

Changed in nova:
status:	Triaged → Invalid
status:	Invalid → Opinion

milan k (vetrisko) on 2015-10-30

Changed in ironic:
assignee:	nobody → milan k (vetrisko)

Revision history for this message

milan k (vetrisko) wrote on 2015-11-07:

simple watchdog implementation preview Edit (2.8 KiB, text/x-python)

Download full text (3.6 KiB)

I'd like to suggest following solution draft, however, before I submit any pull requests I'd like to receive your feedback on the design.

As stated in the bug description, there's no direct update call from Ironic to Nova Compute Ironic Driver, rather, Nova polls info about available Ironic nodes periodically.
The issue is the delay with which node status updates are propagated to Nova Scheduler.
A solution may be to update Nova directly instead, and for that purpose I would like to introduce Node state Watchdogs that would call back (to Nova) through registered Http Requests.
While this can be implemented as a little change to Nova (add a refresh Rest Api Url to Post a Request to making nova call the Resources update method in turn) it is going to be a bigger change for Ironic.

In general, a Watchdog is configured to Fire an Action upon an Event matched Watchdog Conditions[1].
Furthermore, a WatchdogManager[1] keeps track of Watchdog objects and provides an interface to add, remove, update watchdog objects as well as dispatch Events to particular Watchdog.
Since the Events are emitted by Nodes changing state (and being added/removed), all Watchdog objects that might possibly Fire upon an Event should be available in all ConductorManager processes because ConductorManager[2] is where the Node-action emitting an Event is performed.
To make the Watchdog objects available in all ConductorManager instances (spread on multiple hosts), the WatchdogManager has to interface a global storage (database) to both keep the Watchdog objects persistent and to allow centralized Watchdog management.
A Cache may be used to reduce the count of global storage accesses. Watchdog objects are expected to "outlive" the Nodes here.
On the other hand, Events are transient and local to the ConductorManager & Node. Therefore, one may assume Events related to a single Node are being handled "close" to (in the same process of) the Node's ConductorManager and need not be available in all ConductorManager instances.
Moreover, to both keep the Event processing close to its source and to allow it to finish asynchronously, the Watchdog Action should Fire in context of the GreenThread emitting the Event as a result of a Node-action.
Therefore, an "emits" decorator is introduced to wrap some ConductorManager async methods with an Event-processing function dispatching the Events to related Watchdogs.

To expose the Watchdog management, new Pecan Hook and Controller have to be introduced possibly reusing WatchdogManager.
I haven't figured out this part yet but I've been thinking along these lines:

Create a watchdog; "type" here is a class to be imported and instantiated while assembling a watchdog:
http://ironic.example.com/api/v1/watchdogs/ <- POST <- {
    'type': 'ironic.watchdog.Watchdog',
    'whitelist': [
        { 'type': 'ironic.watchdog.condition.NodeStateCondition', 'state': 'ironic.states.DEPLOYFAIL' },
    ],
    'action': {
        'type': 'ironic.watchdog.actions.RestRequest',
        'url: 'http://nova.example.com/compute/refresh',
        'method': 'post',
        'auth': ['admin', 'admin'],
        'body_hook': 'ironic.watchdog.hooks.NodeBody'
   },
}

Get...

I'd like to suggest following solution draft, however, before I submit any pull requests I'd like to receive your feedback on the design.

In general, a Watchdog is configured to Fire an Action upon an Event matched Watchdog Conditions[1].
Furthermore, a WatchdogManager[1] keeps track of Watchdog objects and provides an interface to add, remove, update watchdog objects as well as dispatch Events to particular Watchdog.
Since the Events are emitted by Nodes changing state (and being added/removed), all Watchdog objects that might possibly Fire upon an Event should be available in all ConductorManager processes because ConductorManager[2] is where the Node-action emitting an Event is performed. 
To make the Watchdog objects available in all ConductorManager instances (spread on multiple hosts), the WatchdogManager has to interface a global storage (database) to both keep the Watchdog objects persistent and to allow centralized Watchdog management.
A Cache may be used to reduce the count of global storage accesses. Watchdog objects are expected to "outlive" the Nodes here.
On the other hand, Events are transient and local to the ConductorManager & Node. Therefore, one may assume Events related to a single Node are being handled "close" to (in the same process of) the Node's ConductorManager and need not be available in all ConductorManager instances.
Moreover, to both keep the Event processing close to its source and to allow it to finish asynchronously, the Watchdog Action should Fire in context of the GreenThread emitting the Event as a result of a Node-action.
Therefore, an "emits" decorator is introduced to wrap some ConductorManager async methods with an Event-processing function dispatching the Events to related Watchdogs.

To expose the Watchdog management, new Pecan Hook and Controller have to be introduced possibly reusing WatchdogManager.
I haven't figured out this part yet but I've been thinking along these lines:

Create a watchdog; "type" here is a class to be imported and instantiated while assembling a watchdog:
http://ironic.example.com/api/v1/watchdogs/ <- POST <- {
    'type': 'ironic.watchdog.Watchdog',
    'whitelist': [
        { 'type': 'ironic.watchdog.condition.NodeStateCondition', 'state': 'ironic.states.DEPLOYFAIL' },
    ],
    'action': { 
        'type': 'ironic.watchdog.actions.RestRequest',
        'url: 'http://nova.example.com/compute/refresh',
        'method': 'post',
        'auth': ['admin', 'admin'],
        'body_hook':  'ironic.watchdog.hooks.NodeBody'
   },
}

Get a watchdog, Delete, Update(put):
/watchdogs/ or /watchdogs/<ID>/ <verb>

In conclusion, Nova Compute Ironic Driver registers a Watchdog at Ironic to be called-back in case a Node is added, deleted or changed state. This in turn allows Nova to keep its list of resources up-to-date asynchronously rather than by polling.

[1] attached watchdog implementation model
[2] https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L20

Revision history for this message

Sam Betts (sambetts) wrote on 2015-11-09:

There are already plans to rework the ironic nova driver to support multiple nova computes managing the ironic node pool and as part of this rework the way available resources are processed is going to change, so you may want to read up on these changes here:

https://etherpad.openstack.org/p/summit-mitaka-ironic-nova-driver
https://review.openstack.org/#/c/194453/11/specs/mitaka/approved/ironic-multiple-compute-hosts.rst

In regards to the watchdog idea, there is now work in progress plans to provide a notification bus from ironic on which events such as the ones you've described above will get published. This will run on top of the OpenStack messaging system, and will be a publish subscribe model. Information about these plans can be found here:

https://etherpad.openstack.org/p/summit-mitaka-ironic-notifications-bus