Scheduler selects deleted baremetal nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
aeva black |
Bug Description
When a baremetal node is deleted, the associated compute_node record stops receiving periodic updates (but is not actually deleted). However, the scheduler's ComputeFilter seems to be unaware of this and continues to try to assign Nova instances to the deleted node.
To reproduce, start devstack with the baremetal driver, enroll a node (nova baremetal-
To see whether this was just a timeout issue, I left devstack running for many hours after deleting the baremetal node, as can be seen from the database records below (some columns snipped for brevity).
stack@ubuntu:
*******
hypervisor
hypervisor_
hypervisor_
stack@ubuntu:
*******
instance_
registration_
instance_
stack@ubuntu:
2013-03-01 16:51:34
Here is a snippet from n-schd in devstack when calling "nova boot". What I don't understand, and what seems to be causing this issue, is why the servicegroup API believes this compute_node is still up! Note the 'updated_at' value logged by servicegroup.api is recent, whereas in the db, it is much older.
2013-03-01 16:50:12.271 DEBUG nova.scheduler.
2013-03-01 16:50:12.280 DEBUG nova.servicegro
2013-03-01 16:50:12.281 DEBUG nova.servicegro
2013-03-01 16:50:12.281 DEBUG nova.scheduler.
Changed in nova: | |
importance: | Undecided → High |
assignee: | nobody → Devananda van der Veen (devananda) |
milestone: | none → grizzly-rc1 |
tags: | added: baremetal |
Changed in nova: | |
status: | New → Confirmed |
Changed in nova: | |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | grizzly-rc1 → 2013.1 |
I did some more digging, and it looks like the issue is that: drivers/ db.py starts a FixedIntervalLo opingCall for self._report_state when the service is started node_ref[ 'compute_ node'][ 0], as returned from service_ get_by_ compute_ host.
- the nova compute service is still running
- servicegroup/
- which continues to report its state
- which includes a reference to compute_
- and so that compute_node appears to be getting updated every 10 seconds, even when the compute driver knows it is dead.
I've roughly validated this by observing the following:
- adding additional baremetal nodes doesn't affect which compute_node is included in servicegroup _report_state RPC call
- deleting baremetal nodes also has no effect on the RPC call
- the scheduler continues to believe a compute_node is online when it is included in the _report_state RPC call, even if the compute driver knows otherwise, because its 'updated_at' value is never more than 10 seconds old.
- deleting the oldest compute_node causes _report_state to include the next-oldest compute_node at the next update interval.
- deleting all the compute_nodes results in the _report_state RPC call properly including no nodes.
I'm convinced at this point that there needs to be a way to inform Nova that a compute_node is dead/deleted, besides merely relying on the last update_at timestamp for the associated compute service. This should allow deployers to remove baremetal nodes from production without breaking the scheduler's ability to find available nodes.
An alternative (but I think more complex) solution would be for the service group API to understand that a compute service may have any number of compute nodes (not just 1) and then to track their status' distinctly.