When masakari detects hypervisor failure and tries to relocate VM, if VM goes into an ERROR state for whatever external reason, masakari has running notification which prevents to modify segment.
ubuntu@masakari:~$ openstack segment host list 18dc0b80-312d-45fd-83e2-14c5949033cb
+--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
| uuid | name | type | control_attributes | reserved | on_maintenance | failover_segment_id |
+--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
| fee6f658-424d-41bd-b973-abd09e6c9322 | node1.maas | COMPUTE | SSH | False | False | 18dc0b80-312d-45fd-83e2-14c5949033cb |
| 053d88f9-76f4-4689-bb78-e1f33d4fe65d | node2.maas | COMPUTE | SSH | False | True | 18dc0b80-312d-45fd-83e2-14c5949033cb |
| 914a41b5-8c3d-4ac2-b811-e62b1a284707 | node5.maas | COMPUTE | SSH | False | False | 18dc0b80-312d-45fd-83e2-14c5949033cb |
| 7784ef2f-070b-4b7a-8530-938a48220daf | node3.maas | COMPUTE | SSH | False | False | 18dc0b80-312d-45fd-83e2-14c5949033cb |
+--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
ubuntu@masakari:~$ openstack segment host update 18dc0b80-312d-45fd-83e2-14c5949033cb node2.maas --on_maintenance False
ConflictException: 409: Client Error for url: http://10.0.5.29:15868/v1/94be942354c949d88d5c45f3ae13d645/segments/18dc0b80-312d-45fd-83e2-14c5949033cb/hosts/053d88f9-76f4-4689-bb78-e1f33d4fe65d, Host 053d88f9-76f4-4689-bb78-e1f33d4fe65d can't be updated as it is in-use to process notifications.
ubuntu@masakari:~$ openstack notification list
+--------------------------------------+----------------------------+----------+--------------+--------------------------------------+----------------------------------------------------------------------------+
| notification_uuid | generated_time | status | type | source_host_uuid | payload |
+--------------------------------------+----------------------------+----------+--------------+--------------------------------------+----------------------------------------------------------------------------+
| 958ae386-9da0-4399-950e-3529b581aeb0 | 2020-06-04T15:40:08.000000 | running | COMPUTE_HOST | 053d88f9-76f4-4689-bb78-e1f33d4fe65d | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
| 59e2d55f-b48e-442b-854f-7bd37de4a8c6 | 2020-06-04T15:32:01.000000 | finished | COMPUTE_HOST | 73290564-ceff-488b-bcf9-961236e6fb2b | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
| 149350cb-7e9f-4e5b-bcc9-63bf9001e90f | 2020-06-04T15:17:56.000000 | finished | COMPUTE_HOST | 5fc79709-56ea-4129-9bca-69f7aaa4214a | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
| 80d00f33-cb7c-4237-b237-645797d32ade | 2020-06-04T15:04:53.000000 | finished | COMPUTE_HOST | 2d286880-09fc-4ab4-8ff3-a7cb597e378a | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
+--------------------------------------+----------------------------+----------+--------------+--------------------------------------+----------------------------------------------------------------------------+
Steps to reproduce:
- disable all hypervisors but one with openstack compute service set node5.maas nova-compute --disable
- launch server on the remaining hypervisor
- kill hypervisor to trigger masakari action
- masakari will see the failure, it will try to relocate the server but as there are not hypervisor available, server goes to error state
Environment bionic-stein
note: it is really any ERROR reason, above is just easiest to reproduce. I saw it when glance became unavalable briefly during relocate and relocation failed
I *believe* that this bug is a duplicate of LP Bug #1882656 [0]. Liam will be back tomorrow and can confirm or disconfirm.
[0] https:/ /bugs.launchpad .net/charm- masakari/ +bug/1882656