Comment 6 for bug 1947753

Revision history for this message
Belmiro Moreira (moreira-belmiro-email-lists) wrote :

I agree with Sylvain, we should improve the documentation.
(I can take care of that)

The evacuation will succeed if there is an issue in the RabbitMQ/RPC connection to the compute node.
I also think that we should account for those cases. I don't understand the race situation that you mention when running the _destroy_evacuated_instances() as a periodic task (is this the interval of time that both instances can be running simultaneously until the period task runs?).

Let's consider other example. There is an issue with a network switch behind a rack.
The compute nodes will be marked as down (ready to evacuate instances). At this point the cloud operators would like to evacuate critical instances... but make all the other instances will be available as soon the hardware repair team and the network team fix the issue. Yeah... in these cases we will need to have a lot of coordination between different teams.

I was just assuming that having the _destroy_evacuated_instances() in a periodic task would simplify all of this... For sure then the operator needs to configure a reasonable periodic task interval that minimises the impact of having to instances running at the same time.