[networking_templates] RabbitMQ server uses 'management' network for replication even if 'mgmt/messaging' role is assigned to another network

Bug #1528707 reported by Artem Panchenko
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Kyrylo Galanov
8.0.x
Fix Released
High
Kyrylo Galanov

Bug Description

After successful adding of new controller node to environment which had 1 ready controller and was deployed with network template (almost all services are moved to separate networks), OSTF tests for RabbitMQ (HA) start to fail:

  - RabbitMQ availability (failure) Number of RabbitMQ nodes is not equal to number of cluster nodes.
  - RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.200.240.4 from controller node! Please refer to OpenStack logs for more details.

Steps to reproduce:

1. Deploy environment with 1 controller node and network template [0]
2. Add new controller node
3. Deploy changes

Expected result: after deployment cluster passes health checks

Actual: HA health checks for RabbitMQ fails, new controller can't join AMQP cluster

Here is a part of pacemaker logs:

http://paste.openstack.org/show/482555/

As you can see 'mgmt/messaging' role is assigned to a separate (isolated) network in template:

root@node-4:~# python -c 'import yaml; print yaml.load(open("/etc/astute.yaml"))["network_scheme"]["roles"]["mgmt/messaging"]'
br-messaging

root@node-4:~# ip -o -4 a sh dev br-messaging
42: br-messaging inet 10.200.240.4/24 brd 10.200.240.255 scope global br-messaging\ valid_lft forever preferred_lft forever

AMQP on node-1 (old controller) is reachable from node-4 (new controller) via that network, but inaccessible via common 'management' net:

root@node-4:~# hiera amqp_hosts
10.200.240.4:5673, 10.200.240.1:5673

root@node-4:~# ip -o r g 10.200.240.1
10.200.240.1 dev br-messaging src 10.200.240.4 \ cache

root@node-4:~# nc -w 2 -z 10.200.240.1 4369 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z 10.200.240.1 5673 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z node-1 4369 && echo Connected || echo Failed
Failed
root@node-4:~# nc -w 2 -z node-1 5673 && echo Connected || echo Failed
Failed

root@node-4:~# host node-1
node-1.test.domain.local has address 10.109.21.4

This is because firewall blocks AMQP traffic in 'management' network:

root@node-1:~# iptables -L INPUT -n -v | grep -E '4369|5673'
    0 0 ACCEPT tcp -- * * 10.109.20.2 0.0.0.0/0 multiport sports 4369,5672,41055,55672,61613 /* 003 remote rabbitmq */
  119 7140 ACCEPT tcp -- * * 10.200.240.0/24 0.0.0.0/0 multiport ports 4369,5672,5673,41055 /* 106 rabbitmq from 10.200.240.0/24 */

After I added the following rule to iptables on both controllers, RabbitMQ cluster was successfully assembled:

iptables -I INPUT -s 10.109.21.0/24 -p tcp -m multiport --ports 4369,5672,5673,41055 -j ACCEPT

It means that RabbitMQ continues to use 'management' network for clustering even if 'mgmt/messaging' network role is assigned to different network, most probably due to hosts names usage instead of IPs.

[0] https://github.com/openstack/fuel-qa/blob/901a27e56656b49f3d43092ef922fccc25f7c50e/fuelweb_test/network_templates/cinder.yaml

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
Maciej Relewicz (rlu)
tags: added: area-library
tags: added: team-network
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
status: New → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

We can not give up on using fqdn node names for RabbitMQ because we need to support TLS in the future.
That makes this issue only DNS/hosts file related. Perhaps, it should resolve names as expected

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

For DNS resolve, we could use SRV [0] records perhaps.
Although, nodes rely on /etc/hosts instead, AFAIK.

So we could as well do net-template-based FQDNs instead, like
messaging-node*-domain.local 1.2.3.4
corosync-node*-domain.local 5.6.7.8
database-node*-domain.local 9.10.11.12

and rely on *these* FQDNS instead.

[0] https://en.wikipedia.org/wiki/SRV_record

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

I agree with Bogdan.
Now I testing dedicated hostname for VM live migration. This approach is work.

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :
tags: added: blocked
Revision history for this message
Sergey Shevorakov (sshevorakov) wrote :

This bug is a swarm-blocker, since it fails 4 TCs.

tags: added: swarm-blocker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/262486

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/262535

Dmitry Pyzhov (dpyzhov)
tags: removed: blocked
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/262486
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd
Submitter: Jenkins
Branch: master

commit 68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd
Author: Sergey Vasilenko <email address hidden>
Date: Wed Dec 30 15:56:29 2015 +0300

    Implement prefixed hostnames support

    in the network_metadata_to_hosts() parser function.

    This functional required for separation nova live migration
    and RMQ-server to dedicated networks.

    Change-Id: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Related-bug: 1528707

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/mitaka
Changed in fuel:
milestone: 8.0 → 9.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/262535
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ca65b718de8c3ca015d72416643a88a78f09b7ce
Submitter: Jenkins
Branch: master

commit ca65b718de8c3ca015d72416643a88a78f09b7ce
Author: Kyrylo Galanov <email address hidden>
Date: Tue Dec 29 12:41:26 2015 +0200

    Introduce node name prefix for mgmt/messaging IPs

    RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

    Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
    Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Closes-bug: #1528707

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/269637

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269637

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/8.0)

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/269676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/269754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269676

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/270098

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/269637
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=80e093db6448eb5dab236d89fca0257edfcec2f7
Submitter: Jenkins
Branch: stable/8.0

commit 80e093db6448eb5dab236d89fca0257edfcec2f7
Author: Sergey Vasilenko <email address hidden>
Date: Wed Dec 30 15:56:29 2015 +0300

    Implement prefixed hostnames support

    in the network_metadata_to_hosts() parser function.

    This functional required for separation nova live migration
    and RMQ-server to dedicated networks.

    Change-Id: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Related-bug: 1528707
    (cherry picked from commit 68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/270098
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=5724477c83cb0aad179484a8f60eca9e026f9715
Submitter: Jenkins
Branch: stable/8.0

commit 5724477c83cb0aad179484a8f60eca9e026f9715
Author: Kyrylo Galanov <email address hidden>
Date: Tue Dec 29 12:41:26 2015 +0200

    Introduce node name prefix for mgmt/messaging IPs

    RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

    Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
    Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Closes-bug: #1528707
    DocImpact: https://bugs.launchpad.net/fuel/+bug/1535383
    (cherry picked from commit ca65b718de8c3ca015d72416643a88a78f09b7ce)

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Verified on ISO #515

Revision history for this message
Alexey Galkin (agalkin) wrote :

Verified as fixed in 9.0-242.

Results:
http://paste.openstack.org/show/495455

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.