when an instance qemu-kvm pid abnormally terminated, it triggered an instance failure notification. The recovery workflow for the instance faiure is to stop then start the instance. In my env, the instance stop action random failed, because the nova-compute would periodically sync the instance status. The instance showed active but calling stop action it just returned 409 "code": 409, "message": "Cannot 'stop' instance be66077f-1092-4490-bf2c-a5be6268fd52 while it is in vm_state stopped".
masakari-engine.log
2022-07-05 17:41:47.818 28 INFO masakari.compute.nova [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-b21745a8-69d0-419a-8045-c0f4bc90d15b nova - - - -] Call get server command for instance be66077f-1092-4490-bf2c-a5be6268fd52
2022-07-05 17:41:48.185 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-b21745a8-69d0-419a-8045-c0f4bc90d15b nova - - - -] REQ: curl -g -i -X GET http://100.2.244.113:5080/nova/v2.1/8ae58d5fe1864792bba4dcd79b9b4ccd/servers/be66077f-1092-4490-bf2c-a5be6268fd52 -H "Accept: application/json" -H "OpenStack-API-Version: compute 2.56" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA256}6107c32c79f4d88bb6c96020b8c1de43a98a7747651cacc0825874d6b6decf7b" -H "X-OpenStack-Nova-API-Version: 2.56" -H "X-OpenStack-Request-ID: req-ea99511d-29a5-43d7-8b99-a03f8fd46d87" _http_log_request /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:517
2022-07-05 17:41:48.445 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-b21745a8-69d0-419a-8045-c0f4bc90d15b nova - - - -] RESP: [200] Connection: keep-alive Content-Length: 2085 Content-Type: application/json Date: Tue, 05 Jul 2022 09:41:48 GMT Openstack-Api-Version: compute 2.56 Vary: OpenStack-API-Version, X-OpenStack-Nova-API-Version X-Compute-Request-Id: req-9e1f1d7d-dbbc-4979-aca3-1c702cd12cd2 X-Openstack-Nova-Api-Version: 2.56 X-Openstack-Request-Id: req-9e1f1d7d-dbbc-4979-aca3-1c702cd12cd2 _http_log_response /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:548
2022-07-05 17:41:48.446 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-b21745a8-69d0-419a-8045-c0f4bc90d15b nova - - - -] RESP BODY: {"server": {"id": "be66077f-1092-4490-bf2c-a5be6268fd52", "name": "scyrbd007", "status": "ACTIVE", "tenant_id": "c2bcf4067e8a4bacb20619ddadbd7377", "user_id": "21c0dcf1bc78464c8df149a1ae8a5304", "metadata": {"panic": "", "cpu_mode": "host-model", "source_type": "volume", "serial_number": "0daf20e6-7103-4bb4-935d-4b96062e635c"}, "hostId": "6560346edd56072f5c98239a5dfb938aaa52b45a00fa56f90a940773", "image": "", "flavor": {"vcpus": 1, "ram": 512, "disk": 1, "ephemeral": 0, "swap": 0, "original_name": "m1.tiny", "extra_specs": {"capabilities:hypervisor_type": "QEMU"}}, "created": "2022-07-05T06:49:26Z", "updated": "2022-07-05T08:51:03Z", "addresses": {"zqx_net": [{"version": 4, "addr": "192.168.67.74", "OS-EXT-IPS:type": "fixed", "OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:4e:ab:9f"}]}, "accessIPv4": "", "accessIPv6": "", "links": [{"rel": "self", "href": "http://100.2.244.113:5080/v2.1/8ae58d5fe1864792bba4dcd79b9b4ccd/servers/be66077f-1092-4490-bf2c-a5be6268fd52"}, {"rel": "bookmark", "href": "http://100.2.244.113:5080/8ae58d5fe1864792bba4dcd79b9b4ccd/servers/be66077f-1092-4490-bf2c-a5be6268fd52"}], "OS-DCF:diskConfig": "AUTO", "progress": 0, "locked": false, "description": "", "tags": [], "config_drive": "True", "OS-EXT-AZ:availability_zone": "nova", "OS-EXT-SRV-ATTR:hypervisor_hostname": "node3.cluster.local", "OS-EXT-SRV-ATTR:host": "node3.cluster.local", "OS-EXT-SRV-ATTR:instance_name": "instance-00000038", "OS-EXT-SRV-ATTR:reservation_id": "r-wfbn2dyz", "OS-EXT-SRV-ATTR:launch_index": 0, "OS-EXT-SRV-ATTR:hostname": "scyrbd007", "OS-EXT-SRV-ATTR:kernel_id": "", "OS-EXT-SRV-ATTR:ramdisk_id": "", "OS-EXT-SRV-ATTR:root_device_name": "/dev/vda", "OS-EXT-SRV-ATTR:user_data": null, "host_status": "UP", "OS-EXT-STS:task_state": null, "OS-EXT-STS:vm_state": "active", "OS-EXT-STS:power_state": 1, "os-extended-volumes:volumes_attached": [{"id": "9f982463-b85d-4f48-9a1c-6edb9b1c7a74", "delete_on_termination": false}], "key_name": null, "security_groups": [{"name": "default"}], "OS-SRV-USG:launched_at": "2022-07-05T06:50:08.000000", "OS-SRV-USG:terminated_at": null}} _http_log_response /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:580
2022-07-05 17:41:48.446 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-b21745a8-69d0-419a-8045-c0f4bc90d15b nova - - - -] GET call to nova for http://100.2.244.113:5080/nova/v2.1/8ae58d5fe1864792bba4dcd79b9b4ccd/servers/be66077f-1092-4490-bf2c-a5be6268fd52 used request id req-9e1f1d7d-dbbc-4979-aca3-1c702cd12cd2 request /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:943
2022-07-05 17:41:48.447 28 DEBUG masakari.compute.nova [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-9411b91e-ed81-4a81-8ca5-a48716c436f3 nova - - - -] Creating a Nova client using "nova" user novaclient /var/lib/kolla/venv/lib/python3.6/site-packages/masakari/compute/nova.py:109
2022-07-05 17:41:48.449 28 INFO masakari.compute.nova [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-9411b91e-ed81-4a81-8ca5-a48716c436f3 nova - - - -] Call stop server command for instance be66077f-1092-4490-bf2c-a5be6268fd52
2022-07-05 17:41:48.812 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-9411b91e-ed81-4a81-8ca5-a48716c436f3 nova - - - -] REQ: curl -g -i -X POST http://100.2.244.113:5080/nova/v2.1/8ae58d5fe1864792bba4dcd79b9b4ccd/servers/be66077f-1092-4490-bf2c-a5be6268fd52/action -H "Accept: application/json" -H "Content-Type: application/json" -H "OpenStack-API-Version: compute 2.56" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA256}d18bb647ea28f4167bb4b4fe97cf41980cc9704dd8857a6bf09314a5c98c3788" -H "X-OpenStack-Nova-API-Version: 2.56" -H "X-OpenStack-Request-ID: req-ea99511d-29a5-43d7-8b99-a03f8fd46d87" -d '{"os-stop": null}' _http_log_request /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:517
2022-07-05 17:41:48.924 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-9411b91e-ed81-4a81-8ca5-a48716c436f3 nova - - - -] RESP: [409] Connection: keep-alive Content-Length: 143 Content-Type: application/json; charset=UTF-8 Date: Tue, 05 Jul 2022 09:41:48 GMT Openstack-Api-Version: compute 2.56 Vary: OpenStack-API-Version, X-OpenStack-Nova-API-Version X-Compute-Request-Id: req-3b6d1ff0-dff8-4d0f-b2b7-6368387b9b27 X-Openstack-Nova-Api-Version: 2.56 X-Openstack-Request-Id: req-3b6d1ff0-dff8-4d0f-b2b7-6368387b9b27 _http_log_response /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:548
2022-07-05 17:41:48.925 28 DEBUG novaclient.v2.client [req-ea99511d-29a5-43d7-8b99-a03f8fd46d87 req-9411b91e-ed81-4a81-8ca5-a48716c436f3 nova - - - -] RESP BODY: {"conflictingRequest": {"code": 409, "message": "Cannot 'stop' instance be66077f-1092-4490-bf2c-a5be6268fd52 while it is in vm_state stopped"}} _http_log_response /var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py:580
nova-compute.log
2022-07-05 17:41:48.344 444801 INFO nova.compute.manager [- req-None - - - - -] [instance: be66077f-1092-4490-bf2c-a5be6268fd52] VM Stopped (Lifecycle Event)
2022-07-05 17:41:48.382 444801 INFO nova.compute.manager [None req-9aff95d8-54f8-4a20-9db7-1a529805435c - - - - -] Lifecycle Event is 1
2022-07-05 17:41:48.422 444801 INFO nova.compute.manager [None req-9aff95d8-54f8-4a20-9db7-1a529805435c - - - - -] [instance: be66077f-1092-4490-bf2c-a5be6268fd52] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (4). Updating power_state in the DB to match the hypervisor.
2022-07-05 17:41:48.476 444801 WARNING nova.compute.manager [None req-9aff95d8-54f8-4a20-9db7-1a529805435c - - - - -] [instance: be66077f-1092-4490-bf2c-a5be6268fd52] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 4
2022-07-05 17:41:48.562 444801 INFO nova.compute.manager [None req-9aff95d8-54f8-4a20-9db7-1a529805435c - - - - -] [instance: be66077f-1092-4490-bf2c-a5be6268fd52] Instance is already powered off in the hypervisor when stop is called.
2022-07-05 17:41:48.642 444801 INFO nova.virt.driver [None req-9aff95d8-54f8-4a20-9db7-1a529805435c - - - - -] [instance: be66077f-1092-4490-bf2c-a5be6268fd52] Instance already shutdown.
2022-07-05 17:41:48.647 444801 INFO nova.virt.driver [- req-None - - - - -] [instance: be66077f-1092-4490-bf2c-a5be6268fd52] Instance destroyed successfully.
Reviewed: https:/ /review. opendev. org/c/openstack /masakari/ +/850866 /opendev. org/openstack/ masakari/ commit/ 7241101163550b3 cd981262468dc52 3ccb25cc9e
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 7241101163550b3 cd981262468dc52 3ccb25cc9e
Author: suzhengwei <email address hidden>
Date: Mon Jul 25 17:28:31 2022 +0800
Fix stopping stopped instances
Nova-compute would periodically sync the instance status. So in the
instance failure recovery workflow,it would stop instance random failed
because of conflict, which will terminate the recovery workflow.
This patch can catch the Confict exception and continue the recovery
workflow if the instance already stopped.
Closes-Bug: 1980736 c1ddc8f4c362e96 7a15b8ec5e8
Change-Id: I59a1f9d7078614