Evacuated instances are not removed from the source

Bug #1947753 reported by Belmiro Moreira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Wishlist
Unassigned

Bug Description

Instance "evacuation" is a great feature and we are trying to take advantage of it.
But, it has some limitations, depending how "broken" is the node.

Let me give some context...

In the scenario where the compute node loses connectivity (broken switch port, loose network cable, ...) or nova-compute is suck (filesystem issue) evacuating instances can have some unexpected consequences and lead to data corruption in the application (for example in a DB application).

If a compute node loses connectivity (or an entire set of compute nodes), nova-compute and the instances are "not available".
If the node runs critical applications (let's suppose a MySQL DB), the cloud operator could be tempted to "evacuate" the instance to recover the critical application for the user. At this point the cloud operator may not know yet the compute node issue and maybe it won't be possible to shut it down (management network affected?, ...) or even simply don't want to interfere with the work of the repair team.

The repair teams fixes the issue (it can take few minutes or hours...) and nova-compute and the instances are available again.

The problem is that nova-compute doesn't destroy the evacuated instances in the source.

```
2021-10-19 11:17:51.519 3050 WARNING nova.compute.resource_tracker [req-0ed10e35-2715-466a-918b-69eb1fc770e8 - - - - -] Instance fc3be091-56d3-4c69-8adb-2fdb8b0a35d2 has been moved to another host foo.cern.ch(foo.cern.ch). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 1, u'MEMORY_MB': 1875}}.
```

At this point we have 2 instances sharing the same IP and possibly writing into the same volume.

Only when nova-compute is restarted (I guess that was always the assumption... the compute node was really broken) the evacuated instances in the affected node are removed.

```
2021-10-19 15:39:49.257 21189 INFO nova.compute.manager [req-ded45b0c-20ab-4587-9533-8c613d977f79 - - - - -] Destroying instance as it has been evacuated from this host but still exists in the hypervisor
2021-10-19 15:39:52.949 21189 INFO nova.virt.libvirt.driver [ ] Instance destroyed successfully.
```

I would expect that nova-compute will constantly check for the evacuated instances and then removed them.
Otherwise, this requires a lot of coordination between different support teams.

Should this be moved to a periodic task?
https://github.com/openstack/nova/blob/e14eef0719eceef35e7e96b3e3d242ec79a80969/nova/compute/manager.py#L1440

I'm running Stein, but looking into the code, we have the same behaviour in master.

Tags: evacuate
Revision history for this message
Wenping Song (wenping1) wrote :

maybe your evacuation status is not in ['accepted', 'pre-migrating', 'done'], see the bug https://bugs.launchpad.net/nova/+bug/1947812 report by me.

Revision history for this message
Belmiro Moreira (moreira-belmiro-email-lists) wrote :

I think the problem is different from https://bugs.launchpad.net/nova/+bug/1947812

In the particular case that I'm describing the "nova-compute" is not stopped, just stuck.
This can happen if we have a RAID issue in the node, the file system is blocked and consequently "nova-compute" is stuck.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

OK, let me get it right.

You say that if you want to evacuate an instance, you don't really know whether the original service runs correctly, right?
That's basically why Nova verifies whether the host is not operational and somehow 'failed'.
Sometimes, you're right, Nova thinks the compute service isn't faulty and then you can't evacuate. Some other time, Nova thinks the compute service *is* faulty and then you can evacuate.

If you're doing so, then indeed you could have problems *if* the host is actually running.
That's why in general we recommend operators to "fence" the original faulty host that's detected by Nova before evacuating.

Either way, if the service continues to run, it verifies the evacuation status periodically and deletes the host. So, maybe you're getting a race when you evacuate while a compute fault is transient and then you see a problem.

If so, I'd recommend you, as I said, to 'fence' the host before evacuating instances... or wait a little bit before evacuating the instances if the issue is transient.
Maybe that's something related to healthchecks we want to work on : if you're getting a better status of a faulty compute service, you wouldn't issue evacuations unless you're sure it went down.

Putting the bug report as Opinion but I'm more than happy to discuss with you, Belmiro, on #openstack-nova if you wish.

Changed in nova:
status: New → Opinion
importance: Undecided → Wishlist
tags: added: evacuate
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I think nova only destroys evacuated instances during init_host[1] and does not do it periodically. So this is not a race.

I do agree that if instances are evacuated without the compute node is properly fenced that could lead to VM duplication and corruption.

I think we should at least discuss if we want to call _destroy_evacuated_instances() from a periodic to somehow mitigate the issue. But it would be racy as you noted above.

Another option is to only allow evacuation if the operator first forced the compute down via the API to make it explicit that the node need to be fenced before it can be evacuated. However this would be an API semantic change.

[1] https://github.com/openstack/nova/blob/00452a403b57723b364477082ce1587a909b2a6b/nova/compute/manager.py#L1440

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

What we can is to explain more in our upstream docs that it's operator's responsibility to ensure the host is down before evacuating.

What we currently say is " Also, you must validate that the current VM host is not operational. Otherwise, the evacuation fails." which is not fully true as the status of the host can be down, the evacuating succeeding, but the source host still be operating.

https://docs.openstack.org/nova/latest/admin/evacuate.html

Revision history for this message
Belmiro Moreira (moreira-belmiro-email-lists) wrote :

I agree with Sylvain, we should improve the documentation.
(I can take care of that)

The evacuation will succeed if there is an issue in the RabbitMQ/RPC connection to the compute node.
I also think that we should account for those cases. I don't understand the race situation that you mention when running the _destroy_evacuated_instances() as a periodic task (is this the interval of time that both instances can be running simultaneously until the period task runs?).

Let's consider other example. There is an issue with a network switch behind a rack.
The compute nodes will be marked as down (ready to evacuate instances). At this point the cloud operators would like to evacuate critical instances... but make all the other instances will be available as soon the hardware repair team and the network team fix the issue. Yeah... in these cases we will need to have a lot of coordination between different teams.

I was just assuming that having the _destroy_evacuated_instances() in a periodic task would simplify all of this... For sure then the operator needs to configure a reasonable periodic task interval that minimises the impact of having to instances running at the same time.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.