Segment cannot be modified after server in ERROR state.

Bug #1882113 reported by Marian Gasparovic
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Masakari Charm
New
Undecided
Unassigned

Bug Description

When masakari detects hypervisor failure and tries to relocate VM, if VM goes into an ERROR state for whatever external reason, masakari has running notification which prevents to modify segment.

ubuntu@masakari:~$ openstack segment host list 18dc0b80-312d-45fd-83e2-14c5949033cb
+--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
| uuid | name | type | control_attributes | reserved | on_maintenance | failover_segment_id |
+--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+
| fee6f658-424d-41bd-b973-abd09e6c9322 | node1.maas | COMPUTE | SSH | False | False | 18dc0b80-312d-45fd-83e2-14c5949033cb |
| 053d88f9-76f4-4689-bb78-e1f33d4fe65d | node2.maas | COMPUTE | SSH | False | True | 18dc0b80-312d-45fd-83e2-14c5949033cb |
| 914a41b5-8c3d-4ac2-b811-e62b1a284707 | node5.maas | COMPUTE | SSH | False | False | 18dc0b80-312d-45fd-83e2-14c5949033cb |
| 7784ef2f-070b-4b7a-8530-938a48220daf | node3.maas | COMPUTE | SSH | False | False | 18dc0b80-312d-45fd-83e2-14c5949033cb |
+--------------------------------------+------------+---------+--------------------+----------+----------------+--------------------------------------+

ubuntu@masakari:~$ openstack segment host update 18dc0b80-312d-45fd-83e2-14c5949033cb node2.maas --on_maintenance False
ConflictException: 409: Client Error for url: http://10.0.5.29:15868/v1/94be942354c949d88d5c45f3ae13d645/segments/18dc0b80-312d-45fd-83e2-14c5949033cb/hosts/053d88f9-76f4-4689-bb78-e1f33d4fe65d, Host 053d88f9-76f4-4689-bb78-e1f33d4fe65d can't be updated as it is in-use to process notifications.

ubuntu@masakari:~$ openstack notification list
+--------------------------------------+----------------------------+----------+--------------+--------------------------------------+----------------------------------------------------------------------------+
| notification_uuid | generated_time | status | type | source_host_uuid | payload |
+--------------------------------------+----------------------------+----------+--------------+--------------------------------------+----------------------------------------------------------------------------+
| 958ae386-9da0-4399-950e-3529b581aeb0 | 2020-06-04T15:40:08.000000 | running | COMPUTE_HOST | 053d88f9-76f4-4689-bb78-e1f33d4fe65d | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
| 59e2d55f-b48e-442b-854f-7bd37de4a8c6 | 2020-06-04T15:32:01.000000 | finished | COMPUTE_HOST | 73290564-ceff-488b-bcf9-961236e6fb2b | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
| 149350cb-7e9f-4e5b-bcc9-63bf9001e90f | 2020-06-04T15:17:56.000000 | finished | COMPUTE_HOST | 5fc79709-56ea-4129-9bca-69f7aaa4214a | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
| 80d00f33-cb7c-4237-b237-645797d32ade | 2020-06-04T15:04:53.000000 | finished | COMPUTE_HOST | 2d286880-09fc-4ab4-8ff3-a7cb597e378a | {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'} |
+--------------------------------------+----------------------------+----------+--------------+--------------------------------------+----------------------------------------------------------------------------+

Steps to reproduce:
- disable all hypervisors but one with openstack compute service set node5.maas nova-compute --disable
- launch server on the remaining hypervisor
- kill hypervisor to trigger masakari action
- masakari will see the failure, it will try to relocate the server but as there are not hypervisor available, server goes to error state

Environment bionic-stein

note: it is really any ERROR reason, above is just easiest to reproduce. I saw it when glance became unavalable briefly during relocate and relocation failed

Revision history for this message
David Ames (thedac) wrote :

I *believe* that this bug is a duplicate of LP Bug #1882656 [0]. Liam will be back tomorrow and can confirm or disconfirm.

[0] https://bugs.launchpad.net/charm-masakari/+bug/1882656

Revision history for this message
David Ames (thedac) wrote :

Just to clarify,

This bug is a duplicate of the upstream masakari bug 1773765

https://bugs.launchpad.net/bugs/1773765

Please ignore comment #1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.