Fuel for OpenStack

[networking_templates] RabbitMQ server uses 'management' network for replication even if 'mgmt/messaging' role is assigned to another network

Bug #1528707 reported by Artem Panchenko on 2015-12-22

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Kyrylo Galanov	Fuel for OpenStack 9.0
	8.0.x	Fix Released	High	Kyrylo Galanov	Fuel for OpenStack 8.0

Bug Description

After successful adding of new controller node to environment which had 1 ready controller and was deployed with network template (almost all services are moved to separate networks), OSTF tests for RabbitMQ (HA) start to fail:

- RabbitMQ availability (failure) Number of RabbitMQ nodes is not equal to number of cluster nodes.
- RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.200.240.4 from controller node! Please refer to OpenStack logs for more details.

Steps to reproduce:

1. Deploy environment with 1 controller node and network template [0]
2. Add new controller node
3. Deploy changes

Expected result: after deployment cluster passes health checks

Actual: HA health checks for RabbitMQ fails, new controller can't join AMQP cluster

Here is a part of pacemaker logs:

http://paste.openstack.org/show/482555/

As you can see 'mgmt/messaging' role is assigned to a separate (isolated) network in template:

root@node-4:~# python -c 'import yaml; print yaml.load(open("/etc/astute.yaml"))["network_scheme"]["roles"]["mgmt/messaging"]'
br-messaging

root@node-4:~# ip -o -4 a sh dev br-messaging
42: br-messaging inet 10.200.240.4/24 brd 10.200.240.255 scope global br-messaging\ valid_lft forever preferred_lft forever

AMQP on node-1 (old controller) is reachable from node-4 (new controller) via that network, but inaccessible via common 'management' net:

root@node-4:~# hiera amqp_hosts
10.200.240.4:5673, 10.200.240.1:5673

root@node-4:~# ip -o r g 10.200.240.1
10.200.240.1 dev br-messaging src 10.200.240.4 \ cache

root@node-4:~# nc -w 2 -z 10.200.240.1 4369 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z 10.200.240.1 5673 && echo Connected || echo Failed
Connected
root@node-4:~# nc -w 2 -z node-1 4369 && echo Connected || echo Failed
Failed
root@node-4:~# nc -w 2 -z node-1 5673 && echo Connected || echo Failed
Failed

root@node-4:~# host node-1
node-1.test.domain.local has address 10.109.21.4

This is because firewall blocks AMQP traffic in 'management' network:

root@node-1:~# iptables -L INPUT -n -v | grep -E '4369|5673'
0 0 ACCEPT tcp -- * * 10.109.20.2 0.0.0.0/0 multiport sports 4369,5672,41055,55672,61613 /* 003 remote rabbitmq */
119 7140 ACCEPT tcp -- * * 10.200.240.0/24 0.0.0.0/0 multiport ports 4369,5672,5673,41055 /* 106 rabbitmq from 10.200.240.0/24 */

After I added the following rule to iptables on both controllers, RabbitMQ cluster was successfully assembled:

iptables -I INPUT -s 10.109.21.0/24 -p tcp -m multiport --ports 4369,5672,5673,41055 -j ACCEPT

It means that RabbitMQ continues to use 'management' network for clustering even if 'mgmt/messaging' network role is assigned to different network, most probably due to hosts names usage instead of IPs.

[0] https://github.com/openstack/fuel-qa/blob/901a27e56656b49f3d43092ef922fccc25f7c50e/fuelweb_test/network_templates/cinder.yaml

Tags:

Revision history for this message

Artem Panchenko (apanchenko-8) wrote on 2015-12-22:

fail_error_add_nodes_net_tmpl-fuel-snapshot-2015-12-22_02-04-22.tar.xz Edit (96.7 MiB, application/octet-stream)

Maciej Relewicz (rlu) on 2015-12-23

tags:

added: area-library

Matthew Mosesohn (raytrac3r) on 2015-12-23

tags:

added: team-network

Kyrylo Galanov (kgalanov) on 2015-12-23

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
status:	New → In Progress

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-12-28:

We can not give up on using fqdn node names for RabbitMQ because we need to support TLS in the future.
That makes this issue only DNS/hosts file related. Perhaps, it should resolve names as expected

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-12-28:

For DNS resolve, we could use SRV [0] records perhaps.
Although, nodes rely on /etc/hosts instead, AFAIK.

So we could as well do net-template-based FQDNs instead, like
messaging-node*-domain.local 1.2.3.4
corosync-node*-domain.local 5.6.7.8
database-node*-domain.local 9.10.11.12

and rely on *these* FQDNS instead.

[0] https://en.wikipedia.org/wiki/SRV_record

Revision history for this message

Sergey Vasilenko (xenolog) wrote on 2015-12-28:

I agree with Bogdan.
Now I testing dedicated hostname for VM live migration. This approach is work.

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2015-12-29:

Depends on https://review.openstack.org/#/c/204208/ (https://bugs.launchpad.net/fuel/+bug/1528902)

Kyrylo Galanov (kgalanov) on 2015-12-29

tags:

added: blocked

Revision history for this message

Sergey Shevorakov (sshevorakov) wrote on 2015-12-29:

This bug is a swarm-blocker, since it fails 4 TCs.

tags:

added: swarm-blocker

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-30: Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/262486

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2015-12-30:

Also may depend on https://bugs.launchpad.net/mos/+bug/1486070

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-30: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/262535

Dmitry Pyzhov (dpyzhov) on 2016-01-11

tags:

removed: blocked

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-12: Related fix merged to fuel-library (master)

#10

Reviewed: https://review.openstack.org/262486
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd
Submitter: Jenkins
Branch: master

commit 68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd
Author: Sergey Vasilenko <email address hidden>
Date: Wed Dec 30 15:56:29 2015 +0300

Implement prefixed hostnames support

in the network_metadata_to_hosts() parser function.

This functional required for separation nova live migration
and RMQ-server to dedicated networks.

Change-Id: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
Related-bug: 1528707

Dmitry Pyzhov (dpyzhov) on 2016-01-14

no longer affects:	fuel/mitaka
Changed in fuel:
milestone:	8.0 → 9.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-18: Fix merged to fuel-library (master)

#11

Reviewed: https://review.openstack.org/262535
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=ca65b718de8c3ca015d72416643a88a78f09b7ce
Submitter: Jenkins
Branch: master

commit ca65b718de8c3ca015d72416643a88a78f09b7ce
Author: Kyrylo Galanov <email address hidden>
Date: Tue Dec 29 12:41:26 2015 +0200

Introduce node name prefix for mgmt/messaging IPs

RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

    Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
    Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Closes-bug: #1528707

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-19: Related fix proposed to fuel-library (stable/8.0)

#12

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/269637

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-19: Change abandoned on fuel-library (stable/8.0)

#13

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269637

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-19: Related fix proposed to fuel-library (stable/8.0)

#14

Related fix proposed to branch: stable/8.0
Review: https://review.openstack.org/269676

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-19: Fix proposed to fuel-library (stable/8.0)

#15

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/269754

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-20: Change abandoned on fuel-library (stable/8.0)

#16

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269676

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-20:

#17

Change abandoned by Kyrylo Galanov (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/269754

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-20: Fix proposed to fuel-library (stable/8.0)

#18

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/270098

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-21: Related fix merged to fuel-library (stable/8.0)

#19

Reviewed: https://review.openstack.org/269637
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=80e093db6448eb5dab236d89fca0257edfcec2f7
Submitter: Jenkins
Branch: stable/8.0

commit 80e093db6448eb5dab236d89fca0257edfcec2f7
Author: Sergey Vasilenko <email address hidden>
Date: Wed Dec 30 15:56:29 2015 +0300

Implement prefixed hostnames support

in the network_metadata_to_hosts() parser function.

This functional required for separation nova live migration
and RMQ-server to dedicated networks.

    Change-Id: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Related-bug: 1528707
    (cherry picked from commit 68bcee4af6fb707ca8ad3fe3fbc831b98a1d27cd)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-25: Fix merged to fuel-library (stable/8.0)

#20

Reviewed: https://review.openstack.org/270098
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=5724477c83cb0aad179484a8f60eca9e026f9715
Submitter: Jenkins
Branch: stable/8.0

commit 5724477c83cb0aad179484a8f60eca9e026f9715
Author: Kyrylo Galanov <email address hidden>
Date: Tue Dec 29 12:41:26 2015 +0200

Introduce node name prefix for mgmt/messaging IPs

RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP

    Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698
    Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703
    Closes-bug: #1528707
    DocImpact: https://bugs.launchpad.net/fuel/+bug/1535383
    (cherry picked from commit ca65b718de8c3ca015d72416643a88a78f09b7ce)