Rescuing RBD volume-backed instance does not work

Bug #1926601 reported by Marius L
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Rajesh Tailor

Bug Description

Context:
- Openstack Victoria deployment
- Separate ceph pools for Cinder volumes and Nova VMs.
- Trying to rescue a volume-backed instance with --image parameter.

The rescue disk is not even created in the VMs pool, so the instance is put into ERROR because of the following exception:

libvirt.libvirtError: internal error: process exited while connecting to monitor: 2021-04-29T11:19:00.527948Z qemu-system-x86_64: -blockdev {"driver":"rbd","pool":"vms","image":"f2dfce55-94cb-43e2-b799-9f9a2671c38b_disk","server":[{"host":"10.0.1.81","port":"6789"},{"host":"10.0.1.82","port":"6789"},{"host":"10.0.1.83","port":"6789"}],"user":"nova","auth-client-required":["cephx","none"],"key-secret":"libvirt-1-storage-secret0","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error reading header from f2dfce55-94cb-43e2-b799-9f9a2671c38b_disk: No such file or directory

Stacktrace

2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] Traceback (most recent call last):
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/compute/manager.py", line 4178, in rescue_instance
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] self.driver.rescue(context, instance, network_info,
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 3668, in rescue
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] self._create_guest(
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 6637, in _create_guest
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] guest.launch(pause=pause)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/virt/libvirt/guest.py", line 158, in launch
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] LOG.error('Error launching a defined domain '
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] self.force_reraise()
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] six.reraise(self.type_, self.value, self.tb)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] raise value
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/virt/libvirt/guest.py", line 155, in launch
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] return self._domain.createWithFlags(flags)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 190, in doit
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] result = proxy_call(self._autowrap, f, *args, **kwargs)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 148, in proxy_call
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] rv = execute(f, *args, **kwargs)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 129, in execute
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] six.reraise(c, e, tb)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] raise value
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 83, in tworker
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] rv = meth(*args, **kwargs)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] File "/usr/lib/python3/dist-packages/libvirt.py", line 1265, in createWithFlags
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b] libvirt.libvirtError: internal error: process exited while connecting to monitor: 2021-04-29T11:19:00.527948Z qemu-system-x86_64: -blockdev {"driver":"rbd","pool":"vms","image":"f2dfce55-94cb-43e2-b799-9f9a2671c38b_disk","server":[{"host":"10.0.1.81","port":"6789"},{"host":"10.0.1.82","port":"6789"},{"host":"10.0.1.83","port":"6789"}],"user":"nova","auth-client-required":["cephx","none"],"key-secret":"libvirt-1-storage-secret0","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}: error reading header from f2dfce55-94cb-43e2-b799-9f9a2671c38b_disk: No such file or directory
2021-04-29 11:19:00.774 6 ERROR nova.compute.manager [instance: f2dfce55-94cb-43e2-b799-9f9a2671c38b]

Revision history for this message
Lee Yarwood (lyarwood) wrote :

The issue here appears to be that a legacy request to rescue a boot from volume instance is allowed when that should only happen with a stable device rescue.

https://review.opendev.org/c/openstack/nova/+/701430/23/nova/api/openstack/compute/rescue.py#65

^ it's being unconditionally allowed by the rescue API with the 2.87 microversion.

As a workaround you can add either the hw_rescue_bus or hw_rescue_device image properties to the image you're providing, this should lead to a valid stable device rescue attempt that works:

$ openstack image create --file /opt/stack/devstack/files/cirros-0.5.1-x86_64-disk.img --disk-format qcow2 --container-format bare --property hw_rescue_bus=virtio cirros-0.5.1-x86_64-disk_stable_rescue
[..]
$ openstack --os-compute-api-version 2.latest server rescue --image c170c32f-e77d-452e-bf59-5a6d3d1dff30 4d7c93c9-16d2-4315-bf92-7d1cb26b07ed
[..]
$ openstack server list
+--------------------------------------+------+--------+---------------------------------------------------------+--------------------------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+---------------------------------------------------------+--------------------------+---------+
| 4d7c93c9-16d2-4315-bf92-7d1cb26b07ed | test | RESCUE | private=10.0.0.20, fdb6:3220:5055:0:f816:3eff:fed0:60ee | N/A (booted from volume) | m1.tiny |
+--------------------------------------+------+--------+---------------------------------------------------------+--------------------------+---------+
$ sudo virsh domblklist 4d7c93c9-16d2-4315-bf92-7d1cb26b07ed
 Target Source
----------------------------------------------------------------
 vda volumes/volume-550ea92f-225f-4142-977a-92bc701b0984
 vdb vms/4d7c93c9-16d2-4315-bf92-7d1cb26b07ed_disk.rescue

Lee Yarwood (lyarwood)
Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
Lee Yarwood (lyarwood)
Changed in nova:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Marius L (marius-leus) wrote :

I tried your suggestion, setting --property hw_rescue_bus=virtio to the rescue image.
The error is gone but I have another issue now.
The VM has 1 boot volume on / and 1 additional volume on /home, so after I rescue I get the following mounts:

- /dev/sda1 / - the 1st (boot) volume
- /dev/sdb1 - the rescue device, unmounted
- /dev/sdc1 /home - the 2nd volume

Basically the rescue image gets written to a sdb device and is not being used for booting.

Revision history for this message
Marius L (marius-leus) wrote :

It works only if I add both:
--property hw_rescue_bus=scsi #can be virtio too
--property hw_rescue_device=disk

Revision history for this message
yule sun (syle87) wrote :

Hello Everyone.
  Can i use the rescue to repair the instance in the Rocky?
When i try to use the command openstack rescue server $instanceid ,i got the same message like:
Instance $instanceid cannot be rescued: Cannot rescue a volume-backed instance (HTTP 400)

Revision history for this message
Lee Yarwood (lyarwood) wrote :

Volume backed rescue wasn't available in Rocky. The only real alternative back then was to detach and reattach the volume to another instance to preform any recovery or rescue operations.

Changed in nova:
assignee: Lee Yarwood (lyarwood) → Rajesh Tailor (ratailor)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
Kaveh Azizi (azizi-kaveh) wrote :

Dears,
I am facing the same problem and have created the image with the same properties as Mr. Yarwood has mentioned, but I am still facing the same problem. I am working on OPS Xena. Is there anything I am missing in my configuration? I have even added "os_traits= COMPUTE_RESCUE_BFV" in the flavor and image as Metadata but still no progress. Could you please confirm if this problem is already solved in Nova? Which version should I use? If not, is there any chance to know when it would be addressed completely and resolve?
many thanks
Kaveh

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/852737
Committed: https://opendev.org/openstack/nova/commit/6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b
Submitter: "Zuul (22348)"
Branch: master

commit 6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b
Author: Rajesh Tailor <email address hidden>
Date: Wed Aug 10 18:15:04 2022 +0530

    Fix rescue volume-based instance

    As of now, when attempting to rescue a volume-based instance
    using an image without the hw_rescue_device and/or hw_rescue_bus
    properties set, the rescue api call fails (as non-stable rescue
    for volume-based instances are not supported) leaving the instance
    in error state.

    This change checks for hw_rescue_device/hw_rescue_bus image
    properties before attempting to rescue and if the property
    is not set, then fail with proper error message, without changing
    instance state.

    Related-Bug: #1978958
    Closes-Bug: #1926601
    Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/nova/+/872116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/nova/+/872118

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/875343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/875344

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/875347

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 27.0.0.0rc1

This issue was fixed in the openstack/nova 27.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/nova/+/872116
Committed: https://opendev.org/openstack/nova/commit/d00a848a735f98b028f5930798ee69ef205c8e2e
Submitter: "Zuul (22348)"
Branch: stable/zed

commit d00a848a735f98b028f5930798ee69ef205c8e2e
Author: Rajesh Tailor <email address hidden>
Date: Wed Aug 10 18:15:04 2022 +0530

    Fix rescue volume-based instance

    As of now, when attempting to rescue a volume-based instance
    using an image without the hw_rescue_device and/or hw_rescue_bus
    properties set, the rescue api call fails (as non-stable rescue
    for volume-based instances are not supported) leaving the instance
    in error state.

    This change checks for hw_rescue_device/hw_rescue_bus image
    properties before attempting to rescue and if the property
    is not set, then fail with proper error message, without changing
    instance state.

    Related-Bug: #1978958
    Closes-Bug: #1926601
    Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1
    (cherry picked from commit 6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/nova/+/872118
Committed: https://opendev.org/openstack/nova/commit/4073aa51f79be54e2e6e8143666a7c1f9a00e03d
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 4073aa51f79be54e2e6e8143666a7c1f9a00e03d
Author: Rajesh Tailor <email address hidden>
Date: Wed Aug 10 18:15:04 2022 +0530

    Fix rescue volume-based instance

    As of now, when attempting to rescue a volume-based instance
    using an image without the hw_rescue_device and/or hw_rescue_bus
    properties set, the rescue api call fails (as non-stable rescue
    for volume-based instances are not supported) leaving the instance
    in error state.

    This change checks for hw_rescue_device/hw_rescue_bus image
    properties before attempting to rescue and if the property
    is not set, then fail with proper error message, without changing
    instance state.

    Related-Bug: #1978958
    Closes-Bug: #1926601
    Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1
    (cherry picked from commit 6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b)
    (cherry picked from commit d00a848a735f98b028f5930798ee69ef205c8e2e)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/875343
Committed: https://opendev.org/openstack/nova/commit/c977027b1933e408c58508e883f6a799ffacc4cc
Submitter: "Zuul (22348)"
Branch: stable/xena

commit c977027b1933e408c58508e883f6a799ffacc4cc
Author: Rajesh Tailor <email address hidden>
Date: Wed Aug 10 18:15:04 2022 +0530

    Fix rescue volume-based instance

    As of now, when attempting to rescue a volume-based instance
    using an image without the hw_rescue_device and/or hw_rescue_bus
    properties set, the rescue api call fails (as non-stable rescue
    for volume-based instances are not supported) leaving the instance
    in error state.

    This change checks for hw_rescue_device/hw_rescue_bus image
    properties before attempting to rescue and if the property
    is not set, then fail with proper error message, without changing
    instance state.

    Related-Bug: #1978958
    Closes-Bug: #1926601
    Change-Id: Id4c8c5f3b32985ac7d3d7c833b82e0876f7367c1
    (cherry picked from commit 6eed55bf55469f4ceaa7d4d4eb1be635e14bc73b)
    (cherry picked from commit d00a848a735f98b028f5930798ee69ef205c8e2e)
    (cherry picked from commit 4073aa51f79be54e2e6e8143666a7c1f9a00e03d)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.1.1

This issue was fixed in the openstack/nova 25.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.1.1

This issue was fixed in the openstack/nova 26.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova xena-eom

This issue was fixed in the openstack/nova xena-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/875347
Reason: stable/victoria branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/victoria if you want to further work on this patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/wallaby)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/875344
Reason: stable/wallaby branch of openstack/nova is about to be deleted. To be able to do that, all open patches need to be abandoned. Please cherry pick the patch to unmaintained/wallaby if you want to further work on this patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.