[networking_templates] RabbitMQ server uses 'management' network for replication even if 'mgmt/messaging' role is assigned to another network
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
High
|
Kyrylo Galanov | ||
8.0.x |
Fix Released
|
High
|
Kyrylo Galanov |
Bug Description
After successful adding of new controller node to environment which had 1 ready controller and was deployed with network template (almost all services are moved to separate networks), OSTF tests for RabbitMQ (HA) start to fail:
- RabbitMQ availability (failure) Number of RabbitMQ nodes is not equal to number of cluster nodes.
- RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.200.240.4 from controller node! Please refer to OpenStack logs for more details.
Steps to reproduce:
1. Deploy environment with 1 controller node and network template [0]
2. Add new controller node
3. Deploy changes
Expected result: after deployment cluster passes health checks
Actual: HA health checks for RabbitMQ fails, new controller can't join AMQP cluster
Here is a part of pacemaker logs:
http://
As you can see 'mgmt/messaging' role is assigned to a separate (isolated) network in template:
root@node-4:~# python -c 'import yaml; print yaml.load(
br-messaging
root@node-4:~# ip -o -4 a sh dev br-messaging
42: br-messaging inet 10.200.240.4/24 brd 10.200.240.255 scope global br-messaging\ valid_lft forever preferred_lft forever
AMQP on node-1 (old controller) is reachable from node-4 (new controller) via that network, but inaccessible via common 'management' net:
root@node-4:~# hiera amqp_hosts
10.200.240.4:5673, 10.200.240.1:5673
root@node-4:~# ip -o r g 10.200.240.1
10.200.240.1 dev br-messaging src 10.200.240.4 \ cache
root@node-4:~# nc -w 2 -z 10.200.240.1 4369 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z 10.200.240.1 5673 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z node-1 4369 && echo Connected || echo Failed
Failed
root@node-4:~# nc -w 2 -z node-1 5673 && echo Connected || echo Failed
Failed
root@node-4:~# host node-1
node-1.
This is because firewall blocks AMQP traffic in 'management' network:
root@node-1:~# iptables -L INPUT -n -v | grep -E '4369|5673'
0 0 ACCEPT tcp -- * * 10.109.20.2 0.0.0.0/0 multiport sports 4369,5672,
119 7140 ACCEPT tcp -- * * 10.200.240.0/24 0.0.0.0/0 multiport ports 4369,5672,
After I added the following rule to iptables on both controllers, RabbitMQ cluster was successfully assembled:
iptables -I INPUT -s 10.109.21.0/24 -p tcp -m multiport --ports 4369,5672,
It means that RabbitMQ continues to use 'management' network for clustering even if 'mgmt/messaging' network role is assigned to different network, most probably due to hosts names usage instead of IPs.
tags: | added: area-library |
tags: | added: team-network |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov) |
status: | New → In Progress |
tags: | added: blocked |
tags: | removed: blocked |
no longer affects: | fuel/mitaka |
Changed in fuel: | |
milestone: | 8.0 → 9.0 |
We can not give up on using fqdn node names for RabbitMQ because we need to support TLS in the future.
That makes this issue only DNS/hosts file related. Perhaps, it should resolve names as expected