kolla-ansible

masakari evacuate problem

Bug #1978804 reported by fereshteh loghmani on 2022-06-15

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	kolla-ansible	New	Undecided	Unassigned

Bug Description

hello
i use masakari for migrate servers when the compute being down.
i install these container on the region(wallaby versoin):
masakari-monitors
masakari-engine
masakari-api
hacluster-pacemaker
hacluster-corosync
and on the compute server i install "hacluster-pacemaker-remote and masakari_instancemonitor"(wallaby versoin).
In openstack i created segment and I added 2 hosts (in this case the name of that 2 hosts are: R3SG5 & R3SG12).
for testing evacuate function i shut off one of that compute that i added in host.
(in this case i shutoff R3SG5)
i attached the log that i found in this directory:
/var/log/kolla/masakari/masakari-hostmonitor.log

*********
INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'online' (current: 'online').
INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online').
WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.
INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline').
INFO masakarimonitors.ha.masakari [-] Send a notification. {'notification': {'type': 'COMPUTE_HOST', 'hostname': 'R3SG5', 'generated_time': datetime.datetime(2022, 6, 14, 7, 6, 46, 138867), 'payload': {'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}}}
INFO masakarimonitors.ha.masakari [-] Response: openstack.instance_ha.v1.notification.Notification(type=COMPUTE_HOST, hostname=R3SG5, generated_time=2022-06-14T07:06:46.138867, payload={'event': 'STOPPED', 'cluster_status': 'OFFLINE', 'host_status': 'NORMAL'}, id=105, notification_uuid=a7364095-cc7d-48f8-b963-c64ba147897c, source_host_uuid=6328f08c-c752-43d5-4689-801d91dd67ec, status=new, created_at=2022-06-14T07:06:47.000000, updated_at=None, location=Munch({'cloud': 'controller', 'region_name': 'RegionThree', 'zone': None, 'project': Munch({'id': 'a75a951b4537478e8cea39a932f830da', 'name': None, 'domain_id': None, 'domain_name': None})}))
INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online').
WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.
INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG5' is 'offline' (current: 'offline').
INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'R3SG12' is 'online' (current: 'online').
WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'eth0' is failed.: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
*****************

also i checked nova_scheduler logs and on that directory I receive an error:

***
ERROR oslo_messaging.rpc.server nova. exception.NoValidHost: No valid host was found. There are not enough hosts available.
*******

finally in OpenStack dashboard in the notification section tatus change from running to failed. after the error state that shows in the notification section, my VM that was on R3SG5 became to ERROR state and the VM still exists on R3SG5 and it does not been migrated to R3SG12.

could you please help me why evacuate function doesn't work correctly?

Tags:

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.