Add checks for baremetal node health for ironic

Bug #1946991 reported by Drew Freiberger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-openstack-service-checks
New
Undecided
Unassigned

Bug Description

Openstack Ironic "baremetal nodes" should be monitored for nodes in "Maintenance=True" state as well as provisioning_state=*failed* or *error*

For instance, all nodes should have provisioning state of one of the following:

active
available
managable (this should probably provoke a warning state, as the machine is not consumable by the cloud users)
cleaning
*wait* (such as clean wait, callback wait, etc)

If the status is "error" or "cleaning failed" or "managable" we should set an alertable state.

Also, if Maintenance = True, the machine is not available for cloud user consumption, so it should also set an alertable state.

The command to query is "openstack baremetal node list", and should have checks added if the openstack endpoint list includes a service with service_name=ironic or service_type=baremetal.

It might be nice for there to be two checks, one for maintenance mode which can be silenced while still alerting on baremetal nodes that go into 'error' or 'clean failed' for provisioning_state.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.