ComputeManager._rebuild_default_impl calls driver.destroy before driver.detach_volume

Bug #2058225 reported by Fabian Wiesel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Description
===========

This is exemplified by the tempest test in master fails with the vmwareapi driver:
tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server
Even with patch for https://review.opendev.org/c/openstack/nova/+/910627

The `ComputeManager._rebuild_default_impl` calls first destroy on the VM in both branches:
- https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3695-L3701
And in the case of a volume backed VM with `reimage_boot_volume=True` calls `ComputeManager._rebuild_volume_backed_instance` here
- https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3710-L3715
The function tries to detach the volume from the destroyed instance and at least in the VMware driver raises an `InstanceNotFound`, which I'd argue would be expected.
- https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3596-L3607

Steps to reproduce
==================
* Install Devstack from master
* Run tempest test `tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server`

Or as a bash script:
```
IMAGE=$(openstack image list -c ID -f value)
ID1=$(openstack server create --flavor 1 --image $IMAGE --boot-from-volume 1 rebuild-1 -c id -f value)
ID2=$(openstack server create --flavor 1 --image $IMAGE --boot-from-volume 1 rebuild-2 -c id -f value)
# Wait for servers to be ready

# Works
openstack server rebuild --os-compute-api-version 2.93 --image $IMAGE $ID1

# Fails
openstack server rebuild --os-compute-api-version 2.93 --reimage-boot-volume --image $IMAGE $ID1

```
Expected result
===============
The test succeeds.

Actual result
=============

Environment
===========
1. Patch proposed in https://review.opendev.org/c/openstack/nova/+/909474
  + Patch proposed in https://review.opendev.org/c/openstack/nova/+/910627

2. Which hypervisor did you use? What's the version of that?

vmwareapi (VSphere 7.0.3 & ESXi 7.0.3)

2. Which storage type did you use?

vmdk on NFS 4.1

3. Which networking type did you use?

networking-nsx-t (https://github.com/sapcc/networking-nsx-t)

Logs & Configs
==============

http://openstack-ci-logs.global.cloud.sap/sapcc-nova--nmllh/index.html

summary: - ComputeManager._rebuild_default_impl calls driver.detroy before
- driver.detach
+ ComputeManager._rebuild_default_impl calls driver.destroy before
+ driver.detach_volume
Revision history for this message
sean mooney (sean-k-mooney) wrote :

did you miss the fact that we detach the bock devices here
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3672-L3677
before we call destroy

we are only reimaging the root disk so we first detach the root disk

then we destroy the instance
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3695-L3701
then we recreate the root disk
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3710-L3715

and finally we call spawn.

https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3727

we don't appear to be trying to detach a volume form a destroyed instance.

the expect block you highlighted

https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3596-L3607
is not related to detachting the volume form a destroyed instance it to handle the expction
when during a rebuild a user deleted the VM in nova rest API.

the InstanceNotFound expcetion should not be raised for any other reason

Revision history for this message
Fabian Wiesel (fabian-wiesel) wrote :

> did you miss the fact that we detach the bock devices here
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3672-L3677
before we call destroy

That's fine, except for the little detail that when `reimage_boot_volume = True`, it becomes `detach_root_bdm = not reimage_boot_volume = False`, so the root disk does for some reason get *not* detached.

> then we recreate the root disk
https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L3710-L3715

> we don't appear to be trying to detach a volume form a destroyed instance.

Then follow the function `_rebuild_volume_backed_instance`.
We agree hopefully, that it is called *after* the old instance has been destroyed, and *before* the new one has been created.
When I read the code, I see that it calls `_detach_root_volume` which unsurprisingly calls `driver.detach_volume`.

This raises now an InstanceNotFound exception in the vmwareapi driver. And I have trouble seeing how it could do otherwise, considering it has been called after the instance has been destroyed, and before another one has been created.

description: updated
Revision history for this message
Fabian Wiesel (fabian-wiesel) wrote :

> the expect block you highlighted [...] is not related to detachting the volume form a destroyed instance it to handle the expction when during a rebuild a user deleted the VM in nova rest API.

Well, but it is the one being raised. I've added the logs. Here the relevant part:

```
Traceback (most recent call last):
  File "/opt/stack/nova/nova/compute/manager.py", line 4133, in _do_rebuild_instance
    self.driver.rebuild(**kwargs)
  File "/opt/stack/nova/nova/virt/driver.py", line 390, in rebuild
    raise NotImplementedError()
NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/nova/nova/compute/manager.py", line 3591, in _rebuild_volume_backed_instance
    self._detach_root_volume(context, instance, root_bdm)
  File "/opt/stack/nova/nova/compute/manager.py", line 3570, in _detach_root_volume
    with excutils.save_and_reraise_exception():
  File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__
    self.force_reraise()
  File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
    raise self.value
  File "/opt/stack/nova/nova/compute/manager.py", line 3556, in _detach_root_volume
    self.driver.detach_volume(context, old_connection_info,
  File "/opt/stack/nova/nova/virt/vmwareapi/driver.py", line 552, in detach_volume
    return self._volumeops.detach_volume(connection_info, instance)
  File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 649, in detach_volume
    self._detach_volume_vmdk(connection_info, instance)
  File "/opt/stack/nova/nova/virt/vmwareapi/volumeops.py", line 569, in _detach_volume_vmdk
    vm_ref = vm_util.get_vm_ref(self._session, instance)
  File "/opt/stack/nova/nova/virt/vmwareapi/vm_util.py", line 1127, in get_vm_ref
    stable_ref.fetch_moref(session)
  File "/opt/stack/nova/nova/virt/vmwareapi/vm_util.py", line 1118, in fetch_moref
    raise exception.InstanceNotFound(instance_id=self._uuid)
nova.exception.InstanceNotFound: Instance d2eb68d3-67bd-49a9-a8a5-2ec03af5cb66 could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/stack/nova/nova/compute/manager.py", line 10848, in _error_out_instance_on_exception
    yield
  File "/opt/stack/nova/nova/compute/manager.py", line 3859, in rebuild_instance
    self._do_rebuild_instance_with_claim(
  File "/opt/stack/nova/nova/compute/manager.py", line 3945, in _do_rebuild_instance_with_claim
    self._do_rebuild_instance(
  File "/opt/stack/nova/nova/compute/manager.py", line 4137, in _do_rebuild_instance
    self._rebuild_default_impl(**kwargs)
  File "/opt/stack/nova/nova/compute/manager.py", line 3714, in _rebuild_default_impl
    self._rebuild_volume_backed_instance(
  File "/opt/stack/nova/nova/compute/manager.py", line 3606, in _rebuild_volume_backed_instance
    raise exception.BuildAbortException(
nova.exception.BuildAbortException: Build of instance d2eb68d3-67bd-49a9-a8a5-2ec03af5cb66 aborted: Failed to rebuild volume backed instance.
```

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.