image cache manager removes used backing files on NFS shared storage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Description
===========
After a site electrical maintenance (power off for two days), most of the instances using ephemeral storage fail to start with "Error : Image <id> could not be found.".
The backing files for these instances in the "/var/lib/
We are using ephemeral storage shared on NFS. Glance images are rebuilt every day, so most instances do not share a common image.
Cause: after poweron a first compute runs _run_image_
First (storage_
Then (storage_
I think we should wait for at least "image_
Steps to reproduce
===========
Use shared NFS ephemeral storage on all computes.
1. create an instance on each compute, each time from a different glance images
2. remove all these images from glance
3. stop all instances
4. stop all nova_compute services
5. wait for 24 hours
(alternatively, echo '{}' > /var/lib/
5. start all nova_compute services
6. wait for the image cache manager to trigger (~ image_cache_
7. start all instances
Expected result
===============
All instances start
Actual result
=============
All the instances fail to start, except on one compute
Environment
===========
tested with nova version 10.1.0 and 13.0.2
on libvirt KVM and shared NFS Netapp storage
I think I just experienced the same thing with a GFS2 filesystem configuration.
g-s-s is not keeping old images, so this is going to happen.
2021-07-07 19:19:09.370 76758 INFO nova.virt. libvirt. driver [-] [instance: 0578149e- 3cf5-4461- 813a-d4c19c8ea2 19] Instance destroyed successfully. fe7f-4fe1- 9b63-70b0bb16f3 23 14e990be9c6d4ee 7bc164053cd199b 03 4ab34c2f0e2849f abd0f6bc1df763d 32 - e09db03a5c73479 0884ce76e1ccb84 e3 e09db03a5c73479 0884ce76e1ccb84 e3] Successfully unplugged vif VIFOpenVSwitch( active= False,address= fa:16:3e: b1:f3:33, bridge_ name='br- int',has_ traffic_ filtering= True,id= a307f72b- bcc1-48d0- bdd8-bdbfa115f0 6e,network= Network( 4450b7ec- 63d9-4753- 9964-dba9e7b10a e4),plugin= 'ovs',port_ profile= VIFPortProfileO penVSwitch, preserve_ on_delete= False,vif_ name='tapa307f7 2b-bc') manager [req-df6690bd- fe7f-4fe1- 9b63-70b0bb16f3 23 14e990be9c6d4ee 7bc164053cd199b 03 4ab34c2f0e2849f abd0f6bc1df763d 32 - e09db03a5c73479 0884ce76e1ccb84 e3 e09db03a5c73479 0884ce76e1ccb84 e3] [instance: 0578149e- 3cf5-4461- 813a-d4c19c8ea2 19] Successfully reverted task state from powering-on on failure for instance. rpc.server [req-df6690bd- fe7f-4fe1- 9b63-70b0bb16f3 23 14e990be9c6d4ee 7bc164053cd199b 03 4ab34c2f0e2849f abd0f6bc1df763d 32 - e09db03a5c73479 0884ce76e1ccb84 e3 e09db03a5c73479 0884ce76e1ccb84 e3] Exception during message handling: nova.exception. ImageNotFound: Image c6396d68- 72a5-4897- 9eb5-d33484d3f9 25 could not be found. rpc.server Traceback (most recent call last): rpc.server File "/usr/lib/ python3/ dist-packages/ nova/image/ glance. py", line 375, in download rpc.server image_chunks = self._client.call( rpc.server File "/usr/lib/ python3/ dist-packages/ nova/image/ glance. py", line 190, in call rpc.server result = getattr(controller, method)(*args, **kwargs) rpc.server File "/usr/lib/ python3/ dist-packages/ glanceclient/ common/ utils.py" , line 628, in inner rpc.server return RequestIdProxy( wrapped( *args, **kwargs)) rpc.server File "/usr/lib/ python3/ dist-packages/ glanceclient/ v2/images. py", line 249, in data rpc.server resp, image_meta = self.http_ client. get(url) rpc.server File "/usr/lib/ python3/ dist-packages/ keystoneauth1/ adapter. py", line 395, in get rpc.server return self.request(url, 'GET', **kwargs) rpc.server File "/usr/lib/ python3/ dist-packages/ glanceclient/ common/ http.py" , line 380, in request rpc.server return self._handle_ response( resp)
2021-07-07 19:19:09.396 76758 INFO os_vif [req-df6690bd-
2021-07-07 19:19:10.078 76758 INFO nova.compute.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07-07 19:19:10.086 76758 ERROR oslo_messaging.
2021-07...