Server Rescue leads to VM ERROR state if VM base image is deleted

Bug #2002606 reported by Maxim Monin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Low
Unassigned

Bug Description

Server has dependency of original base image used to Create Server or Rebuild Server. If original image is deleted from glance, nova has server.image.id reference to non-existent image. Rescue in stable device rescue mode leads to driver error and Server in ERROR state.

Step to reproduce:
1. Create server from glance image.
2. Delete original glance image.
3. Run rescue server with input parameter of rescue image CD with hw_rescue_device and hw_rescue_bus image properties defined.
Second use case:
1. Restore Server from backup (with server Rebuild)
2. Delete backup image.
3. Run rescue in stable device mode.

Result: Instance 73706a9a-e976-4024-8068-c439404ec953 cannot be rescued: Driver Error: Image be0e302d-79bd-44e1-93a5-68661e93b43a could not be found.
Result2: VM in ERROR state.

Expected: VM returned to original state, or in shutdown state.
Expected2: I see no reason to keep dependency from original image for any operation with server. I tested all other operations with server, and they all were succesful, even if base image deleted. Libvirt driver code shows that rescue mode use only 1 property from image_meta to determine bus type for device attached. So I see no reason to raise exception if image_meta data if empty.

Stack trace for openstack ZED:
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [None req-6611c9f1-63f1-497a-a1bc-51e770d2df09 a27cbf048299483592d84f13e9fe9797 b7df78883aad4ef7ba0a2778c7741462 - - 8a096fed8060480f8f906f57fca19780 8a096fed8060480f8f906f57fca19780] [instance: 73706a9a-e976-4024-8068-c439404ec953] Error trying to Rescue Instance: nova.exception.ImageNotFound: Image be0e302d-79bd-44e1-93a5-68661e93b43a could not be found.
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] Traceback (most recent call last):
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/image/glance.py", line 285, in show
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] image = self._client.call(context, 2, 'get', args=(image_id,))
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/image/glance.py", line 191, in call
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] result = getattr(controller, method)(*args, **kwargs)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/glanceclient/v2/images.py", line 197, in get
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] return self._get(image_id)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/glanceclient/common/utils.py", line 670, in inner
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] return RequestIdProxy(wrapped(*args, **kwargs))
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/glanceclient/v2/images.py", line 190, in _get
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] resp, body = self.http_client.get(url, headers=header)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/keystoneauth1/adapter.py", line 395, in get
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] return self.request(url, 'GET', **kwargs)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/glanceclient/common/http.py", line 380, in request
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] return self._handle_response(resp)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/glanceclient/common/http.py", line 120, in _handle_response
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] raise exc.from_response(resp, resp.content)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] glanceclient.exc.HTTPNotFound: HTTP 404 Not Found: No image found with ID be0e302d-79bd-44e1-93a5-68661e93b43a
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953]
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] During handling of the above exception, another exception occurred:
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953]
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] Traceback (most recent call last):
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 4485, in rescue_instance
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] self.driver.rescue(context, instance, network_info,
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4249, in rescue
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] image_meta = objects.ImageMeta.from_image_ref(
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/objects/image_meta.py", line 151, in from_image_ref
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] image_meta = image_api.get(context, image_ref)
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/image/glance.py", line 1205, in get
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] return session.show(context, image_id,
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] File "/usr/lib/python3/dist-packages/nova/image/glance.py", line 287, in show
2023-01-11 07:33:28.761 1207602 ERROR nova.compute.manager [instance: 73706a9a-e976-4024-8068-c439404ec953] _reraise_translated_image_exception(image_id)

Tags: rescue
Maxim Monin (maximmonin)
description: updated
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

This seems to be valid bug exist on master too. At [1] the driver looks up the original image to get the image_metadata but I think we are stashing that in the instance already so instance.image_meta could be used. But we need to make sure that instance.image_meta is not updated before this point to the rescue image's metadata.

Making this as VALID. Feel free to propose patch to fix this.

https://github.com/openstack/nova/blob/d8b4b7bebdc0f55353cd99f372044b9e30315a6d/nova/virt/libvirt/driver.py#L4257-L4259

Changed in nova:
status: New → Triaged
importance: Undecided → Low
tags: added: rescue
Revision history for this message
Maxim Monin (maximmonin) wrote :

Yeah, I checked code about rescue/unrescue + libvirt driver code.
Server Power On + Server Hard Reboot regenerate libvirt xml config by using instance.image_meta;
instance.image_meta change occurs at Server Create/Server Rebuild operation. Server Rescue do not update instance.image_meta

So it seems we can use:
image_meta = instance.image_meta
instead:
https://github.com/openstack/nova/blob/d8b4b7bebdc0f55353cd99f372044b9e30315a6d/nova/virt/libvirt/driver.py#L4257-L4268

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/872385

Changed in nova:
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.