masakari

Bug #2023464
Activity log

Activity log for bug #2023464

Date	Who	What changed	Old value	New value	Message
2023-06-11 03:12:22	VO LE HUY	bug			added bug
2023-06-11 03:12:22	VO LE HUY	attachment added		Crash evacuate flow cause be get deleted vm without try catch https://bugs.launchpad.net/bugs/2023464/+attachment/5679066/+files/get_deleted_vm_will_crash_flow.PNG
2023-06-11 03:12:44	VO LE HUY	description	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: \| msg = "Evacuation of instance started: '%s'" % instance_id \| self.update_details(msg, 0.5) \| instance = self.novaclient.get_server(self.context, \| instance_id) \| thread_pool.spawn_n(self._evacuate_and_confirm, context, \| instance, host_name, \| failed_evacuation_instances, \| reserved_host) \| -----------------------------------------------------------------------	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) -----------------------------------------------------------------------
2023-06-11 03:16:36	VO LE HUY	tags	found not resource	before evacuate found not resource
2023-06-11 03:16:50	VO LE HUY	tags	before evacuate found not resource	not-found-resource-before-evacuate
2023-06-11 03:24:01	VO LE HUY	description	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) -----------------------------------------------------------------------	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) -----------------------------------------------------------------------
2023-06-11 03:40:46	VO LE HUY	description	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) -----------------------------------------------------------------------	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) -----------------------------------------------------------------------
2023-06-11 09:31:06	VO LE HUY	description	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) -----------------------------------------------------------------------	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) ----------------------------------------------------------------------- ... 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server") 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found. ...
2023-06-11 09:36:33	VO LE HUY	description	Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) ----------------------------------------------------------------------- ... 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server") 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found. ...	Today my lab environment got exception like this: "Resource could not be found". My case is very rare where the compute server unfortunately rebooted as soon as it received the request to delete the VM but could not do it, specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: msg = "Evacuation of instance started: '%s'" % instance_id self.update_details(msg, 0.5) -> instance = self.novaclient.get_server(self.context, instance_id) thread_pool.spawn_n(self._evacuate_and_confirm, context, instance, host_name, failed_evacuation_instances, reserved_host) ----------------------------------------------------------------------- ... 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server") 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found. ...