Cannot heal leases with missing resources

Bug #1947918 reported by Jason Anderson
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

The lease healing mechanism roughly works as follows:

- Every so often, a daemon wakes up and checks the state of external resources (polling monitor) or receives a notification about an external resource state changing (notification monitor)
- When this happens, a list of resources that used to be healthy and are now unhealthy is assembled.
- For resources that are now unhealthy, mark them as not reservable. ALSO, find reservations w/in some window of time (currently 1 hour) and reallocate that resource out of the reservation. Reservations beyond that are not updated.
- For resources that are now healthy, mark them as reservable.

There are a few things that DO NOT happen in particular:

1. If reservations are outside the healing window, they do not get another chance to heal, unless the resource transitions once more between healthy and unhealthy, triggering the above flow again.
2. If a reservation had an unhealthy resource deallocated, but no replacement was found, the lease is degraded. There is no mechanism to "revive" such leases if more healthy resources re-enter the pool.

Both of the above I would categorize as bugs, or at least unexpected behavior.

Revision history for this message
Jason Anderson (jasonandersonatuchicago) wrote :

Edit: after looking more at the code, (1) does not appear to be true, as healing does get attempted on every execution of at least the polling monitor. Not sure how the notification monitor is supposed to work for this (can't see any evidence the notification monitor is even used.)

(2) is still a legitimate problem.

Revision history for this message
Jason Anderson (jasonandersonatuchicago) wrote :

Adding to this (possibly a different bug?), it appears that resources will never heal if they have an instance on them, at least for the physical host plugin.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.