"down" nova-compute service spuriously marked as "up" when disabled/enabled
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Low
|
Chris Friesen |
Bug Description
I think our usage of the "updated_at" field to determine whether a service is "up" or not is buggy. Consider this scenario:
1) nova-compute is happily running and is up/enabled on compute-0
2) something causes nova-compute to stop (process crash, hardware fault, power failure, network isolation, etc.)
3) a minute later, the nova-compute service is reported as "down"
4) I run "nova service-disable compute-0 nova-compute", then "nova service-enable compute-0 nova-compute"
5) nova-compute is now reported as "up" for the next minute, and the scheduler might try to assign stuff to it. Since it's not actually available, these requests could be delayed by the RPC timeout period.
I wonder if it would make sense to have a separate "last status timestamp" database field that would only get updated when we get a service status update and not when we change any other fields.
Changed in nova: | |
assignee: | nobody → Eric Xie (mark-xiett) |
status: | New → Incomplete |
Changed in nova: | |
assignee: | nobody → Chris Friesen (cbf123) |
description: | updated |
Changed in nova: | |
status: | Confirmed → In Progress |
summary: |
- nova-compute service spuriously marked as "up" when disabled + "down" nova-compute service spuriously marked as "up" when + disabled/enabled |
Changed in nova: | |
milestone: | none → liberty-1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | liberty-1 → 12.0.0 |
Just curious, what is "incomplete" about this? Is there more information that I can provide?