Nova scheduler not updated immediately when a baremetal node is added or removed

Bug #1248022 reported by Mark McLoughlin
44
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Ironic
Opinion
Wishlist
Unassigned
OpenStack Compute (nova)
Opinion
Medium
Unassigned

Bug Description

With the Ironic driver, if a baremetal node is added/deleted, it is not removed from pool of available resources until the next run of update_available_resource(). During this window, the scheduler may continue to attempt to schedule instances on it (when deleted), or report NoValidHosts (when added) leading to unnecessary failures and scheduling retries.

In compute manager, the update_available_resource() periodic task is responsible for updating the scheduler's knowledge of baremetal nodes:

    @periodic_task.periodic_task
    def update_available_resource(self, context):
        ...
        nodenames = set(self.driver.get_available_nodes())
        for nodename in nodenames:
            rt = self._get_resource_tracker(nodename)
            rt.update_available_resource(context)

update_available_resource() is also called at service startup

This means that you have to wait up to 60 seconds for a node to become available/no longer available.

Revision history for this message
Mark McLoughlin (markmc) wrote :

Evidence of this issue here:

  https://github.com/openstack/tripleo-incubator/blob/e22a2b3/scripts/register-nodes#L27

  echo "Nodes will be available in 60 seconds from now."

:)

tags: added: baremetal
Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
aeva black (tenbrae) wrote :

Confirming that this also affects Ironic.

Changed in ironic:
status: New → Triaged
importance: Undecided → Medium
tags: added: nova-driver
Revision history for this message
Dmitry Tantsur (divius) wrote :

Badly affects TripleO UI, would like it to be fixed in J

Changed in ironic:
milestone: none → juno-rc1
Revision history for this message
Dmitry Tantsur (divius) wrote :

Merged with duplicating bug. Also reposting a comment:

Devananda van der Veen (devananda) wrote on 2014-03-22: #1

I suspect that this is not solvable in the current nova-scheduler architecture, and will require a mechanism for Ironic to actively inform Nova upon resource availability changes (rather than passively wait for Nova to request a list of available resources).

summary: - Nova scheduler not updated immediately when a baremetal node is added
+ Nova scheduler not updated immediately when a baremetal node is added or
+ removed
description: updated
tags: added: ironic
removed: baremetal
aeva black (tenbrae)
Changed in ironic:
milestone: juno-rc1 → none
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Does this still need to be marked as "High"?

Changed in nova:
importance: High → Medium
Revision history for this message
Sean Dague (sdague) wrote :

This is a much bigger architecture issue to address.

Changed in nova:
status: Triaged → Invalid
status: Invalid → Opinion
milan k (vetrisko)
Changed in ironic:
assignee: nobody → milan k (vetrisko)
Revision history for this message
milan k (vetrisko) wrote :
Download full text (3.6 KiB)

I'd like to suggest following solution draft, however, before I submit any pull requests I'd like to receive your feedback on the design.

As stated in the bug description, there's no direct update call from Ironic to Nova Compute Ironic Driver, rather, Nova polls info about available Ironic nodes periodically.
The issue is the delay with which node status updates are propagated to Nova Scheduler.
A solution may be to update Nova directly instead, and for that purpose I would like to introduce Node state Watchdogs that would call back (to Nova) through registered Http Requests.
While this can be implemented as a little change to Nova (add a refresh Rest Api Url to Post a Request to making nova call the Resources update method in turn) it is going to be a bigger change for Ironic.

In general, a Watchdog is configured to Fire an Action upon an Event matched Watchdog Conditions[1].
Furthermore, a WatchdogManager[1] keeps track of Watchdog objects and provides an interface to add, remove, update watchdog objects as well as dispatch Events to particular Watchdog.
Since the Events are emitted by Nodes changing state (and being added/removed), all Watchdog objects that might possibly Fire upon an Event should be available in all ConductorManager processes because ConductorManager[2] is where the Node-action emitting an Event is performed.
To make the Watchdog objects available in all ConductorManager instances (spread on multiple hosts), the WatchdogManager has to interface a global storage (database) to both keep the Watchdog objects persistent and to allow centralized Watchdog management.
A Cache may be used to reduce the count of global storage accesses. Watchdog objects are expected to "outlive" the Nodes here.
On the other hand, Events are transient and local to the ConductorManager & Node. Therefore, one may assume Events related to a single Node are being handled "close" to (in the same process of) the Node's ConductorManager and need not be available in all ConductorManager instances.
Moreover, to both keep the Event processing close to its source and to allow it to finish asynchronously, the Watchdog Action should Fire in context of the GreenThread emitting the Event as a result of a Node-action.
Therefore, an "emits" decorator is introduced to wrap some ConductorManager async methods with an Event-processing function dispatching the Events to related Watchdogs.

To expose the Watchdog management, new Pecan Hook and Controller have to be introduced possibly reusing WatchdogManager.
I haven't figured out this part yet but I've been thinking along these lines:

Create a watchdog; "type" here is a class to be imported and instantiated while assembling a watchdog:
http://ironic.example.com/api/v1/watchdogs/ <- POST <- {
    'type': 'ironic.watchdog.Watchdog',
    'whitelist': [
        { 'type': 'ironic.watchdog.condition.NodeStateCondition', 'state': 'ironic.states.DEPLOYFAIL' },
    ],
    'action': {
        'type': 'ironic.watchdog.actions.RestRequest',
        'url: 'http://nova.example.com/compute/refresh',
        'method': 'post',
        'auth': ['admin', 'admin'],
        'body_hook': 'ironic.watchdog.hooks.NodeBody'
   },
}

Get...

Read more...

Revision history for this message
Sam Betts (sambetts) wrote :

There are already plans to rework the ironic nova driver to support multiple nova computes managing the ironic node pool and as part of this rework the way available resources are processed is going to change, so you may want to read up on these changes here:

https://etherpad.openstack.org/p/summit-mitaka-ironic-nova-driver
https://review.openstack.org/#/c/194453/11/specs/mitaka/approved/ironic-multiple-compute-hosts.rst

In regards to the watchdog idea, there is now work in progress plans to provide a notification bus from ironic on which events such as the ones you've described above will get published. This will run on top of the OpenStack messaging system, and will be a publish subscribe model. Information about these plans can be found here:

https://etherpad.openstack.org/p/summit-mitaka-ironic-notifications-bus

milan k (vetrisko)
Changed in ironic:
assignee: milan k (vetrisko) → nobody
Revision history for this message
Vladyslav Drok (vdrok) wrote :

This is going to be fixed if/when ironic is going to be responsible for reporting nodes as resource providers to placement.

Changed in ironic:
status: Triaged → Opinion
importance: Medium → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.