masakari

Fail evacuate flow with deleted VM

Bug #2023464 reported by VO LE HUY on 2023-06-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	masakari	New	Undecided	Unassigned

Bug Description

Today my lab environment got exception like this: "Resource could not be found". My case is very rare where the compute server unfortunately rebooted as soon as it received the request to delete the VM but could not do it, specifically look up the following behavior, note that it has concurrency:

1) Server 'controller' received request to delete VM (normal or amphora).
2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed.
3) The 'compute' server is back up and running the request to delete the VM.
4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found.

Note: I'm based on Yoga branch.

https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349
-----------------------------------------------------------------------
for instance_id in instance_list:
        msg = "Evacuation of instance started: '%s'" % instance_id
        self.update_details(msg, 0.5)
->      instance = self.novaclient.get_server(self.context,
                                              instance_id)
        thread_pool.spawn_n(self._evacuate_and_confirm, context,
                            instance, host_name,
                            failed_evacuation_instances,
                            reserved_host)
-----------------------------------------------------------------------

...
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid)
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server")
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url)
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs)
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method)
2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found.
...

See original description

Tags:

Revision history for this message

VO LE HUY (huyvl3) wrote on 2023-06-11:

Crash evacuate flow cause be get deleted vm without try catch Edit (28.3 KiB, image/png)

description:	updated
tags:	added: before evacuate
tags:	added: not-found-resource-before-evacuate removed: before evacuate found not resource

VO LE HUY (huyvl3) on 2023-06-11

description:

updated

VO LE HUY (huyvl3) on 2023-06-11

description:

updated

VO LE HUY (huyvl3) on 2023-06-11

description:

updated

VO LE HUY (huyvl3) on 2023-06-11

description:

updated

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Crash evacuate flow cause be get deleted vm without try catch Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.