Cached images incorrectly removed after instance storage comes back online after a prolonged >= 24 hour outage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Low
|
Unassigned |
Bug Description
After a prolonged outage of >= 24 hours any cached images stored on shared instance storage are prone to removal as compute nodes race to complete a pass of the cache manager once the storage returns.
This pass of the cache manager first registers the current node as an active user of the instance store before compiling a list of instances on hosts registered to the instance store. This list then being used to determine which of the cached images can be safely removed.
After a prolonged outage of >= 24 hours the first compute node to run a cache manager pass will only find itself listed as an active user of the instance store. Thus it can and likely will remove cached images for instances hosted on other compute nodes.
IMHO additional care should be taken before calling for the removal of cached images for instances on registered but seemingly inactive compute nodes.
Changed in nova: | |
assignee: | nobody → Anseela M M (anseela-m00) |
tags: | added: image-cache |
Changed in nova: | |
assignee: | Anseela M M (anseela-m00) → nobody |
Changed in nova: | |
status: | Confirmed → Invalid |
* Adding tag "compute" as it affects the compute manager at [1].
> After a prolonged outage of >= 24 hours [...]
This is due to the config option: remove_ unused_ original_ minimum_ age_seconds = 86400
DEFAULT.
References: /git.openstack. org/cgit/ openstack/ nova/tree/ nova/compute/ manager. py?id=56a8fe0cc 7339ea08e304440 6d67341a616eb84 3#n6683 docs.openstack. org/developer/ nova/sample_ config. html
[1] https:/
[2] http://