masakari evacuate problem

Bug #1978804 reported by fereshteh loghmani
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

hello
i use masakari for migrate servers when the compute being down.
i install these container on the region(wallaby versoin):
masakari-monitors
masakari-engine
masakari-api
hacluster-pacemaker
hacluster-corosync
and on the compute server i install "hacluster-pacemaker-remote and masakari_instancemonitor"(wallaby versoin).
In openstack i created segment and I added 2 hosts (in this case the name of that 2 hosts are: R3SG5 & R3SG12).
for testing evacuate function i shut off one of that compute that i added in host.
(in this case i shutoff R3SG5)
i attached the log that i found in this directory:
/var/log/kolla/masakari/masakari-hostmonitor.log

*********
 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'online' (current: 'online').
 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online').
 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.
 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline').
 INFO masakarimonitors.ha.masakari [-] Send a notification. {'notification': {'type': 'COMPUTE_HOST', 'hostname': 'R3SG5', 'generated_time': datetime.datetime(2022, 6, 14, 7, 6, 46, 138867), 'payload': {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}}}
 INFO masakarimonitors.ha.masakari [-] Response: openstack.instance_ha.v1.notification.Notification(type=COMPUTE_HOST, hostname=R3SG5, generated_time=2022-06-14T07:06:46.138867, payload={'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}, id=105, notification_uuid=a7364095-cc7d-48f8-b963-c64ba147897c, source_host_uuid=6328f08c-c752-43d5-4689-801d91dd67ec, status=new, created_at=2022-06-14T07:06:47.000000, updated_at=None, location=Munch({'cloud': 'controller', 'region_name': 'RegionThree', 'zone': None, 'project': Munch({'id': 'a75a951b4537478e8cea39a932f830da', 'name': None, 'domain_id': None, 'domain_name': None})}))
 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online').
 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.
 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline').
 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online').
 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
*****************

also i checked nova_scheduler logs and on that directory I receive an error:

***
ERROR oslo_messaging.rpc.server nova. exception.NoValidHost: No valid host was found. There are not enough hosts available.
*******

finally in OpenStack dashboard in the notification section tatus change from running to failed. after the error state that shows in the notification section, my VM that was on R3SG5 became to ERROR state and the VM still exists on R3SG5 and it does not been migrated to R3SG12.

could you please help me why evacuate function doesn't work correctly?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.