Cannot heal leases with missing resources
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Blazar |
New
|
Undecided
|
Unassigned |
Bug Description
The lease healing mechanism roughly works as follows:
- Every so often, a daemon wakes up and checks the state of external resources (polling monitor) or receives a notification about an external resource state changing (notification monitor)
- When this happens, a list of resources that used to be healthy and are now unhealthy is assembled.
- For resources that are now unhealthy, mark them as not reservable. ALSO, find reservations w/in some window of time (currently 1 hour) and reallocate that resource out of the reservation. Reservations beyond that are not updated.
- For resources that are now healthy, mark them as reservable.
There are a few things that DO NOT happen in particular:
1. If reservations are outside the healing window, they do not get another chance to heal, unless the resource transitions once more between healthy and unhealthy, triggering the above flow again.
2. If a reservation had an unhealthy resource deallocated, but no replacement was found, the lease is degraded. There is no mechanism to "revive" such leases if more healthy resources re-enter the pool.
Both of the above I would categorize as bugs, or at least unexpected behavior.
Edit: after looking more at the code, (1) does not appear to be true, as healing does get attempted on every execution of at least the polling monitor. Not sure how the notification monitor is supposed to work for this (can't see any evidence the notification monitor is even used.)
(2) is still a legitimate problem.