Partitioned RabbitMQ Cluster: Hostname controller-0.internalapi.xxx.local is illegal - Could not auto-cluster with...
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Triaged
|
High
|
Unassigned |
Bug Description
Description
===========
I've deployed an openstack platform on a 10 baremetal nodes farm (3 controller + 7 compute).
Though deployment is finished successfully, compute nodes couldn't register themself to the controllers. (openstack hypervisor list and openstack server list outputs are empty.)
When I checked the nova-compute.log on compute nodes there are lot of error which says:
```
2019-09-28 22:07:36.142 8 ERROR oslo_service.
```
Since, the errors are obviously related to RabbitMQ, I have checked RabbitMQ cluster on the controllers, and it seems the cluster couldn't have been formed-up during the deployment.
Here is the rabbitmq logs from the controllers:
```
============> controller-0 <============
=ERROR REPORT==== 28-Sep-
** System NOT running to use fully qualified hostnames **
** Hostname controller-
=WARNING REPORT==== 28-Sep-
Could not auto-cluster with <email address hidden>: {badrpc,
============> controller-1 <============
=ERROR REPORT==== 28-Sep-
** System NOT running to use fully qualified hostnames **
** Hostname controller-
=WARNING REPORT==== 28-Sep-
Could not auto-cluster with <email address hidden>: {badrpc,
============> controller-2 <============
=ERROR REPORT==== 28-Sep-
** System NOT running to use fully qualified hostnames **
** Hostname controller-
=WARNING REPORT==== 28-Sep-
Could not auto-cluster with <email address hidden>: {badrpc,
```
And this is cluster_status outputs:
```
============> controller-0 <============
[root@controller-0 ~]# podman exec -it rabbitmq /usr/sbin/
Cluster status of node 'rabbit@
[{nodes,
{running_
{cluster_
{partitions,[]},
{alarms,
============> controller-1 <============
[root@controller-1 ~]# podman exec -it rabbitmq /usr/sbin/
Cluster status of node 'rabbit@
[{nodes,
{running_
{cluster_
{partitions,[]},
{alarms,
============> controller-2 <============
[root@controller-2 ~]# podman exec -it rabbitmq /usr/sbin/
Cluster status of node 'rabbit@
[{nodes,
{running_
{cluster_
{partitions,[]},
{alarms,
```
So it seems, auto cluster function for rabbitmq fails because of an FQDN issue.
Here is the rabbitmq.config file related to cluster_node configuration
```
[root@controller-0 ~]# cat /var/lib/
{cluster_nodes, {['<email address hidden>', '<email address hidden>', '<email address hidden>'], disc}},
[root@controller-1 ~]# cat /var/lib/
{cluster_nodes, {['<email address hidden>', '<email address hidden>', '<email address hidden>'], disc}},
[root@controller-2 ~]# cat /var/lib/
{cluster_nodes, {['<email address hidden>', '<email address hidden>', '<email address hidden>'], disc}},
```
I think the /usr/share/
Steps to reproduce
==================
1. Install undercloud
2. Deploy overcloud with:
openstack overcloud deploy \
--timeout 120 \
--templates \
-r ~/templates/
-n ~/templates/
-e ~/custom-
-e ~/custom-
-e ~/templates/
-e ~/templates/
-e ~/templates/
-e ~/templates/
-e ~/templates/
-e ~/templates/
-e ~/templates/
-e ~/custom-
-e ~/custom-
-e ~/custom-
-e ~/templates/
-e ~/templates/
3. After deployment check openstack {hypervisor,server} list outputs.
4. Check rabbitmqctl cluster_info in controller's rabbitmq pods.
Expected result
===============
Rabbitmq cluster should be formed-up properly.
Actual result
=============
Three rabbitmq pod which run standalone mode.
Environment
===========
1. Stein
2. 10 baremetal node (3 controller + 7 compute(HCI) node)
3. My network environment file includes:
CloudName: overcloud0001.
CloudDomain: test.local
Changed in tripleo: | |
importance: | Undecided → High |
milestone: | none → ussuri-1 |
status: | New → Triaged |
Changed in tripleo: | |
milestone: | ussuri-1 → ussuri-2 |
Changed in tripleo: | |
milestone: | ussuri-2 → ussuri-3 |
Changed in tripleo: | |
milestone: | ussuri-3 → ussuri-rc3 |
Changed in tripleo: | |
milestone: | ussuri-rc3 → victoria-1 |
Changed in tripleo: | |
milestone: | victoria-1 → victoria-3 |
I believe we do not configure RABBITMQ_NODENAME in puppets. /tickets. puppetlabs. com/browse/ MODULES- 1673
There had been (long time ago tho) also some naming conventions weirdness IIUC, see https:/