[kolla-ansible][wallaby][masakari] - Instance HA is not working [ERROR] - Host with name <xxxx> could not be found

Bug #1991137 reported by Nilesh
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
masakari-monitors
New
Undecided
Unassigned

Bug Description

* Configured Maskari on kolla-ansible

```
 (venv) root@09a465c6cb30:~# ansible --version
ansible 2.10.17
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /root/templates/venv/lib/python3.8/site-packages/ansible
  executable location = /root/templates/venv/bin/ansible
  python version = 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
(venv) root@09a465c6cb30:~#
```

```
venv) root@09a465c6cb30:/# cat /etc/kolla/globals.yml | grep -i masak
### Masakari
enable_masakari: "yes"
enable_horizon_masakari: "yes"
(venv) root@09a465c6cb30:/#
```

* Ran the deploy

```
kolla-ansible -i /root/templates/multinode deploy
```

* Container are running on both control and compute node.

```
(venv) root@09a465c6cb30:/# ansible -i /root/templates/multinode control -m shell -a "docker ps | grep -i masak"
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
192.168.122.92 | CHANGED | rc=0 >>
22e7467e0b39 kolla/centos-source-masakari-monitors:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_hostmonitor
32f317ea486a kolla/centos-source-masakari-engine:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_engine
166f9a9d3a74 kolla/centos-source-masakari-api:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_api
192.168.122.61 | CHANGED | rc=0 >>
e48591f9328a kolla/centos-source-masakari-monitors:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_hostmonitor
548256891d39 kolla/centos-source-masakari-engine:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_engine
45858f2fd020 kolla/centos-source-masakari-api:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_api
192.168.122.67 | CHANGED | rc=0 >>
76d72c44175e kolla/centos-source-masakari-monitors:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_hostmonitor
dd170de0e235 kolla/centos-source-masakari-engine:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_engine
13b674758c01 kolla/centos-source-masakari-api:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_api
(venv) root@09a465c6cb30:
```

```
(venv) root@09a465c6cb30:/# ansible -i /root/templates/multinode compute -m shell -a "docker ps | grep -i masak"
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
192.168.122.76 | CHANGED | rc=0 >>
a85f569112c3 kolla/centos-source-masakari-monitors:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_instancemonitor
192.168.122.68 | CHANGED | rc=0 >>
3d82da299042 kolla/centos-source-masakari-monitors:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_instancemonitor
192.168.122.87 | CHANGED | rc=0 >>
b2553ccfe8c8 kolla/centos-source-masakari-monitors:wallaby "dumb-init --single-…" 29 minutes ago Up 29 minutes masakari_instancemonitor
(venv) root@09a465c6cb30:/#
```

* Pacemaker is also running well.

```
Cluster Summary:
  * Stack: corosync
  * Current DC: punit-infra-node-1 (version 2.1.4-5.el8-dc6eb4362e) - partition with quorum
  * Last updated: Wed Sep 28 14:01:38 2022
  * Last change: Wed Sep 28 14:01:35 2022 by root via cibadmin on punit-infra-node-0
  * 6 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ punit-infra-node-0 punit-infra-node-1 punit-infra-node-2 ]
  * RemoteOnline: [ punit-compute-ha-node-0 punit-compute-ha-node-1 punit-compute-node-0 ]

Active Resources:
  * punit-compute-node-0 (ocf::pacemaker:remote): Started punit-infra-node-0
  * punit-compute-ha-node-0 (ocf::pacemaker:remote): Started punit-infra-node-1
  * punit-compute-ha-node-1 (ocf::pacemaker:remote): Started punit-infra-node-2
```

* Launched instance. ### already set ``--property HA_Enabled=True``

```
(venv) root@09a465c6cb30:~# openstack server show --fit 835fe77f-9e41-4655-bb8f-08ba926aa5c7
+-------------------------------------+----------------------------------------------------------+
| Field | Value |
+-------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | punit-compute-ha-node-1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | punit-compute-ha-node-1 |
| OS-EXT-SRV-ATTR:instance_name | instance-0000000c |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2022-09-28T15:09:58.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | private=2.2.2.145 |
| config_drive | |
| created | 2022-09-28T15:09:49Z |
| flavor | m1.med (7162050e-0fe4-4ceb-aa1b-1171e51a92bd) |
| hostId | b53f514659a953657f19e1767078dafde212277d0e71adb8bf05b202 |
| id | 835fe77f-9e41-4655-bb8f-08ba926aa5c7 |
| image | centos7 (d6714905-bd93-40b0-96af-5c2bf74b2e1e) |
| key_name | None |
| name | centos-hb-2022-09-28_15-09-00 |
| progress | 0 |
| project_id | 0ac5ebe0375246328d7e884f0333f6bc |
| properties | HA_Enabled='True' |
| security_groups | name='secgroup1' |
| status | ACTIVE |
| updated | 2022-09-28T16:13:18Z |
| user_id | 705da7af68e6453693f42a62be45838b |
| volumes_attached | |
+-------------------------------------+----------------------------------------------------------+
```

* On compute node forcefully crashed the VM,

```
(nova-libvirt)[root@punit-compute-ha-node-1 /]# virsh list --all
 Id Name State
-----------------------------------
 6 instance-0000000c running

(nova-libvirt)[root@punit-compute-ha-node-1 /]# pkill -f -9 instance-0000000c
```

* Instance shutdown and didnt not migrated to other node or did not started.

++++++
ERROR:
++++++

```
2022-09-28 16:14:41.276 7 WARNING masakarimonitors.ha.masakari [-] Retry sending a notification. (BadRequestException: 400: Client Error for url: http://192.168.122.101:15868/v1/notifications, Host with name punit-compute-ha-node-1 could not be found.): openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://192.168.122.101:15868/v1/notifications, Host with name punit-compute-ha-node-1 could not be found.
```

```
(venv) root@09a465c6cb30:~# openstack compute service list | grep -i compu
| cefe25db-0b27-4bac-9f95-2a4dea558be5 | nova-compute | punit-compute-node-0 | nova | enabled | up | 2022-09-28T16:26:51.000000 |
| 9ca98abe-6613-4616-bf45-9e86f950fba1 | nova-compute | punit-compute-ha-node-0 | nova | enabled | up | 2022-09-28T16:26:48.000000 |
| fe1d68b4-0b9c-4b71-9a61-661bcbb5ed3b | nova-compute | punit-compute-ha-node-1 | nova | enabled | up | 2022-09-28T16:26:48.000000 |
(venv) root@09a465c6cb30:~#
```

* Tried to set host and hostname manually did not worked.

```
[root@punit-compute-ha-node-1 ~]# cat /etc/kolla/masakari-instancemonitor/masakari-monitors.conf
[DEFAULT]
debug = true
log_dir = /var/log/kolla/masakari
host = punit-compute-ha-node-1
[api]
region = RegionOne
auth_url = http://192.168.122.101:35357
user_domain_id = default
project_name = service
project_domain_id = default
username = masakari
password = 65JaR623pXSk6Vhwr5ET8W3QoZcNZorEZo8tavD8
cafile =
api_interface = internal

[libvirt]
connection_uri = qemu+tcp://192.168.122.68/system

[root@punit-compute-ha-node-1 ~]#
```

++++++++++++
Expectation:
++++++++++++

* Masakari should gets to work restarting the service and up the VM:

* Any help would be appreciated.

Tags: masakari
Revision history for this message
Matthew Heler (mheler) wrote :
Revision history for this message
Nilesh (cnilesh) wrote :

Hi Matthew,

THanks for the reply, Can you be more specfific, where do I need to change that, Any example. please

For Masakari and HACluster to work properly, the hostnames used
in HACluster need to match with the hostnames used in Nova.

Revision history for this message
Matthew Heler (mheler) wrote :

This is a kolla-ansible issue, resolved in https://review.opendev.org/c/openstack/kolla-ansible/+/870591/3

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.