[networking_templates] RabbitMQ server uses 'management' network for replication even if 'mgmt/messaging' role is assigned to another network

Bug #1528707 reported by Artem Panchenko on 2015-12-22
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Kyrylo Galanov
8.0.x
High
Kyrylo Galanov

Bug Description

After successful adding of new controller node to environment which had 1 ready controller and was deployed with network template (almost all services are moved to separate networks), OSTF tests for RabbitMQ (HA) start to fail:

  - RabbitMQ availability (failure) Number of RabbitMQ nodes is not equal to number of cluster nodes.
  - RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.200.240.4 from controller node! Please refer to OpenStack logs for more details.

Steps to reproduce:

1. Deploy environment with 1 controller node and network template [0]
2. Add new controller node
3. Deploy changes

Expected result: after deployment cluster passes health checks

Actual: HA health checks for RabbitMQ fails, new controller can't join AMQP cluster

Here is a part of pacemaker logs:

http://paste.openstack.org/show/482555/

As you can see 'mgmt/messaging' role is assigned to a separate (isolated) network in template:

root@node-4:~# python -c 'import yaml; print yaml.load(open("/etc/astute.yaml"))["network_scheme"]["roles"]["mgmt/messaging"]'
br-messaging

root@node-4:~# ip -o -4 a sh dev br-messaging
42: br-messaging inet 10.200.240.4/24 brd 10.200.240.255 scope global br-messaging\ valid_lft forever preferred_lft forever

AMQP on node-1 (old controller) is reachable from node-4 (new controller) via that network, but inaccessible via common 'management' net:

root@node-4:~# hiera amqp_hosts
10.200.240.4:5673, 10.200.240.1:5673

root@node-4:~# ip -o r g 10.200.240.1
10.200.240.1 dev br-messaging src 10.200.240.4 \ cache

root@node-4:~# nc -w 2 -z 10.200.240.1 4369 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z 10.200.240.1 5673 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z node-1 4369 && echo Connected || echo Failed
Failed
root@node-4:~# nc -w 2 -z node-1 5673 && echo Connected || echo Failed
Failed

root@node-4:~# host node-1
node-1.test.domain.local has address 10.109.21.4

This is because firewall blocks AMQP traffic in 'management' network:

root@node-1:~# iptables -L INPUT -n -v | grep -E '4369|5673'
    0 0 ACCEPT tcp -- * * 10.109.20.2 0.0.0.0/0 multiport sports 4369,5672,41055,55672,61613 /* 003 remote rabbitmq */
  119 7140 ACCEPT tcp -- * * 10.200.240.0/24 0.0.0.0/0 multiport ports 4369,5672,5673,41055 /* 106 rabbitmq from 10.200.240.0/24 */

After I added the following rule to iptables on both controllers, RabbitMQ cluster was successfully assembled:

iptables -I INPUT -s 10.109.21.0/24 -p tcp -m multiport --ports 4369,5672,5673,41055 -j ACCEPT

It means that RabbitMQ continues to use 'management' network for clustering even if 'mgmt/messaging' network role is assigned to different network, most probably due to hosts names usage instead of IPs.

[0] https://github.com/openstack/fuel-qa/blob/901a27e56656b49f3d43092ef922fccc25f7c50e/fuelweb_test/network_templates/cinder.yaml

Maciej Relewicz (rlu) on 2015-12-23
tags: added: area-library
tags: added: team-network
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
status: New → In Progress
Bogdan Dobrelya (bogdando) wrote :

We can not give up on using fqdn node names for RabbitMQ because we need to support TLS in the future.
That makes this issue only DNS/hosts file related. Perhaps, it should resolve names as expected

Bogdan Dobrelya (bogdando) wrote :

For DNS resolve, we could use SRV [0] records perhaps.
Although, nodes rely on /etc/hosts instead, AFAIK.

So we could as well do net-template-based FQDNs instead, like
messaging-node*-domain.local 1.2.3.4
corosync-node*-domain.local 5.6.7.8
database-node*-domain.local 9.10.11.12

and rely on *these* FQDNS instead.

[0] https://en.wikipedia.org/wiki/SRV_record

Sergey Vasilenko (xenolog) wrote :

I agree with Bogdan.
Now I testing dedicated hostname for VM live migration. This approach is work.

tags: added: blocked
Sergey Shevorakov (sshevorakov) wrote :

This bug is a swarm-blocker, since it fails 4 TCs.

tags: added: swarm-blocker
Dmitry Pyzhov (dpyzhov) on 2016-01-11
tags: removed: blocked

Reviewed: https://review.openstack.org/262486
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd
Submitter: Jenkins
Branch: master

commit 68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd
Author: Sergey Vasilenko <email address hidden>
Date: Wed Dec 30 15:56:29 2015 +0300

    Implement prefixed hostnames support

    in the network_metadata_to_hosts() parser function.

    This functional required for separation nova live migration
    and RMQ-server to dedicated networks.

    Change-Id: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Related-bug: 1528707

Dmitry Pyzhov (dpyzhov) on 2016-01-14
no longer affects: fuel/mitaka
Changed in fuel:
milestone: 8.0 → 9.0

Reviewed: https://review.openstack.org/262535
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ca65b718de8c3ca015d72416643a88a78f09b7ce
Submitter: Jenkins
Branch: master

commit ca65b718de8c3ca015d72416643a88a78f09b7ce
Author: Kyrylo Galanov <email address hidden>
Date: Tue Dec 29 12:41:26 2015 +0200

    Introduce node name prefix for mgmt/messaging IPs

    RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

    Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
    Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Closes-bug: #1528707

Changed in fuel:
status: In Progress → Fix Committed

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269637

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269676

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269754

Reviewed: https://review.openstack.org/269637
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=80e093db6448eb5dab236d89fca0257edfcec2f7
Submitter: Jenkins
Branch: stable/8.0

commit 80e093db6448eb5dab236d89fca0257edfcec2f7
Author: Sergey Vasilenko <email address hidden>
Date: Wed Dec 30 15:56:29 2015 +0300

    Implement prefixed hostnames support

    in the network_metadata_to_hosts() parser function.

    This functional required for separation nova live migration
    and RMQ-server to dedicated networks.

    Change-Id: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Related-bug: 1528707
    (cherry picked from commit 68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd)

Reviewed: https://review.openstack.org/270098
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=5724477c83cb0aad179484a8f60eca9e026f9715
Submitter: Jenkins
Branch: stable/8.0

commit 5724477c83cb0aad179484a8f60eca9e026f9715
Author: Kyrylo Galanov <email address hidden>
Date: Tue Dec 29 12:41:26 2015 +0200

    Introduce node name prefix for mgmt/messaging IPs

    RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

    Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
    Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Closes-bug: #1528707
    DocImpact: https://bugs.launchpad.net/fuel/+bug/1535383
    (cherry picked from commit ca65b718de8c3ca015d72416643a88a78f09b7ce)

Nastya Urlapova (aurlapova) wrote :

Verified on ISO #515

Alexey Galkin (agalkin) wrote :

Verified as fixed in 9.0-242.

Results:
http://paste.openstack.org/show/495455

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers