Activity log for bug #2023464

Date Who What changed Old value New value Message
2023-06-11 03:12:22 VO LE HUY bug added bug
2023-06-11 03:12:22 VO LE HUY attachment added Crash evacuate flow cause be get deleted vm without try catch https://bugs.launchpad.net/bugs/2023464/+attachment/5679066/+files/get_deleted_vm_will_crash_flow.PNG
2023-06-11 03:12:44 VO LE HUY description Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list: | msg = "Evacuation of instance started: '%s'" % instance_id | self.update_details(msg, 0.5) | instance = self.novaclient.get_server(self.context, | instance_id) | thread_pool.spawn_n(self._evacuate_and_confirm, context, | instance, host_name, | failed_evacuation_instances, | reserved_host) | ----------------------------------------------------------------------- Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5)         instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                                 instance, host_name,                                 failed_evacuation_instances,                                 reserved_host) -----------------------------------------------------------------------
2023-06-11 03:16:36 VO LE HUY tags found not resource before evacuate found not resource
2023-06-11 03:16:50 VO LE HUY tags before evacuate found not resource not-found-resource-before-evacuate
2023-06-11 03:24:01 VO LE HUY description Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5)         instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                                 instance, host_name,                                 failed_evacuation_instances,                                 reserved_host) ----------------------------------------------------------------------- Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                                 instance, host_name,                                 failed_evacuation_instances,                                 reserved_host) -----------------------------------------------------------------------
2023-06-11 03:40:46 VO LE HUY description Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                                 instance, host_name,                                 failed_evacuation_instances,                                 reserved_host) ----------------------------------------------------------------------- Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                             instance, host_name,                             failed_evacuation_instances,                             reserved_host) -----------------------------------------------------------------------
2023-06-11 09:31:06 VO LE HUY description Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                             instance, host_name,                             failed_evacuation_instances,                             reserved_host) ----------------------------------------------------------------------- Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                             instance, host_name,                             failed_evacuation_instances,                             reserved_host) ----------------------------------------------------------------------- ... 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server") 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found. ...
2023-06-11 09:36:33 VO LE HUY description Today my lab environment got exception like this: "Resource could not be found". My very rare case where the server reboots too fast ('reboot'), specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                             instance, host_name,                             failed_evacuation_instances,                             reserved_host) ----------------------------------------------------------------------- ... 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server") 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found. ... Today my lab environment got exception like this: "Resource could not be found". My case is very rare where the compute server unfortunately rebooted as soon as it received the request to delete the VM but could not do it, specifically look up the following behavior, note that it has concurrency: 1) Server 'controller' received request to delete VM (normal or amphora). 2) The 'compute' server is off, the above request is still unprocessed just stop at the waiting queue. The 'Masakari Engine' on the 'controller' server has listed the VMs located on the server that just crashed. 3) The 'compute' server is back up and running the request to delete the VM. 4) The 'Maskari Engine' on the 'controller' server continues execution to the step in the source code called 'Task Evacuate'. Right at the line of code that gets the VM information through the Nova SDK, there is an error right before the 'spawning evacuate' line for that VM, of course the error will be described that the resource cannot be found. Note: I'm based on Yoga branch. https://opendev.org/openstack/masakari/src/branch/stable/yoga/masakari/engine/drivers/taskflow/host_failure.py#L349 ----------------------------------------------------------------------- for instance_id in instance_list:         msg = "Evacuation of instance started: '%s'" % instance_id         self.update_details(msg, 0.5) ->      instance = self.novaclient.get_server(self.context,                                               instance_id)         thread_pool.spawn_n(self._evacuate_and_confirm, context,                             instance, host_name,                             failed_evacuation_instances,                             reserved_host) ----------------------------------------------------------------------- ... 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/masakari/compute/nova.py", ler 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return nova.servers.get(uuid) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/v2/servers.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self._get("/servers/%s" % base.getid(server), "server") 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/base.py", line 35 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver resp, body = self.api.client.get(url) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/keystoneauth1/adapter.py", l 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver return self.request(url, 'GET', **kwargs) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver File "/var/lib/kolla/venv/lib/python3.8/site-packages/novaclient/client.py", line 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver raise exceptions.from_response(resp, body, url, method) 2023-06-11 05:02:04.326 7 ERROR masakari.engine.drivers.taskflow.driver masakari.exception.NotFound: Resource could not be found. ...