SM: rabbitmq clustering fails

Bug #1695662 reported by Senthilnathan Murugappan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Fix Committed
Critical
Dheeraj Gautam
R4.0
Fix Committed
Critical
Dheeraj Gautam
Trunk
Fix Committed
Critical
Dheeraj Gautam

Bug Description

server1:
=WARNING REPORT==== 2-Jun-2017::21:31:30 ===
Could not auto-cluster with rabbit@server3ctrl: {badrpc,nodedown}
=WARNING REPORT==== 2-Jun-2017::21:31:30 ===
Could not auto-cluster with rabbit@server2ctrl: {error,tables_not_present}

server2:
=WARNING REPORT==== 2-Jun-2017::21:31:28 ===
Could not auto-cluster with rabbit@server1ctrl: {error,mnesia_not_running}
=WARNING REPORT==== 2-Jun-2017::21:31:28 ===
Could not auto-cluster with rabbit@server3ctrl: {badrpc,nodedown}

server3:
=INFO REPORT==== 2-Jun-2017::21:31:31 ===
Node 'rabbit@server1ctrl' selected for auto-clustering

Both server1 and server2 are coming up at the same time hence rabbitmq clustering failed.
This seems to be a known issue with rabbitmq and the WA is to start rabbitmq-service on one node first, wait for it to be fully-up, then start rabbitmq on all the other nodes (the service can be started on other nodes in parallel too).

http://rabbitmq.1065348.n5.nabble.com/Rabbitmq-boot-failure-with-quot-tables-not-present-quot-td24494.html#a24512

Revision history for this message
Andrey Pavlov (apavlov-e) wrote :

build number?

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Ubuntu-16.04 Newton
Build 4.0-18

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Release-notes:

In Ubuntu 16.04.2 Newton cluster, following rabbitmq issue is seen sometimes.

http://rabbitmq.1065348.n5.nabble.com/Rabbitmq-boot-failure-with-quot-tables-not-present-quot-td24494.html#a24512

Workaround:

1) stop the rabbitmq on the node(service stop and epmd -kill) which is not clustered
(in a three node setup, mostly two of them were clustered together and the third one will be alone. it can also happen that all the rabbitmq nodes are alone too)
2) remove the /var/lib/rabbitmq/mnesia directory…
3) re-start the rabbitmq on the node.
4) check nova service-list output and restart all the services which are marked down. (may be unnecessary but found that nova-compute didn’t reregister if the rabbitmq went down. need to check it out later)

tags: added: releasenote
information type: Proprietary → Public
Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
Dheeraj Gautam (dgautam) wrote :

This was happening randmonly, it would be helpful to replicate the issue.

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

Newton 4.1.0.0-39 CB

SM: 10.87.118.170

root@5b10s1-vm1:~# neutron floatingip-list --tenant-id ab6bcf93-eae9-4ae2-b80b-33e964022a8f
+--------------------------------------+------------------+---------------------+--------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+--------------------------------------+
| 2540337f-d970-47f7-83f8-1dbfb33ad584 | 10.0.0.11 | 10.87.118.169 | b9139ba9-b5b5-4a05-aaef-f4c462c9312f |
| 09fa2559-df61-4211-b2c1-57a1e488c98d | 10.0.0.9 | 10.87.118.167 | aae81ba5-6b64-4f0a-94d3-b6cca38a29ad |
| 34e7e4d4-19c3-4468-9f0d-1a59f197d447 | 10.0.0.5 | 10.87.118.159 | db4bdf87-e91c-45ae-918b-f7448021dd80 |
| 7da121b5-c987-4151-80cd-40571a560ecb | 10.0.0.10 | 10.87.118.158 | 435e6f85-02b9-4a08-823d-7a7a35acbc3e |
| ef40dff2-fbd1-4f8d-bd49-9d5eae453ff3 | 10.0.0.6 | 10.87.118.162 | 09a55e09-1393-4982-be40-1ad17047d205 |
| 4596ccec-1fae-4126-86b7-084d8c5b6980 | 10.0.0.7 | 10.87.118.161 | 939b9580-6abb-41f3-a81d-ede3ac8f3086 |
| 786aeacb-b391-4533-ae68-7db3a4facb0a | 10.0.0.13 | 10.87.118.170 | e29cf36f-fa96-4530-9831-ac2b0b499312 |
| 1382e40e-0a34-478a-a63f-258b54d5a70d | 10.0.0.8 | 10.87.118.168 | 50671bb3-23ab-4ef3-b07a-ff881fb1b7bc |
| 5570dcb9-e9f1-4904-8d53-83219f337fa0 | 10.0.0.12 | 10.87.118.160 | 13473394-bc43-4263-b3c9-38d2ff93cbb6 |
| 14a24829-84db-4299-a84e-2eb061813bb1 | 10.0.0.4 | 10.87.118.157 | 407176eb-4b2d-4dcc-abcd-896df2f68174 |
+--------------------------------------+------------------+---------------------+--------------------------------------+
root@5b10s1-vm1:~#

Revision history for this message
Jeba Paulaiyan (jebap) wrote :
tags: added: sanity
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/35150
Submitter: Dheeraj Gautam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/35150
Committed: http://github.com/Juniper/contrail-puppet/commit/31b0045795a66c64ae47105208ee0844a56add46
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 31b0045795a66c64ae47105208ee0844a56add46
Author: Dheeraj Gautam <email address hidden>
Date: Thu Aug 31 12:45:14 2017 -0700

non master nodes should wait for master rabbit to come up

Change-Id: I72f6073ae8f0026f1f87c219f95e4c14d67aa938
Closes-Bug: #1695662

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/35202
Submitter: Dheeraj Gautam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/35202
Committed: http://github.com/Juniper/contrail-puppet/commit/803ffcf50a50c8cc59af4b30665f818820d90d10
Submitter: Zuul (<email address hidden>)
Branch: master

commit 803ffcf50a50c8cc59af4b30665f818820d90d10
Author: Dheeraj Gautam <email address hidden>
Date: Thu Aug 31 12:45:14 2017 -0700

non master nodes should wait for master rabbit to come up

Change-Id: I72f6073ae8f0026f1f87c219f95e4c14d67aa938
Closes-Bug: #1695662
(cherry picked from commit 31b0045795a66c64ae47105208ee0844a56add46)

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

In R3.2 CB #42, this issue is seen:

root@5b9s1-vm1:~# neutron floatingip-list --tenant-id e6f9d9e0-4e99-49c0-a208-3dd4cc42de5b
+--------------------------------------+------------------+---------------------+--------------------------------------+
| id | fixed_ip_address | floating_ip_address | port_id |
+--------------------------------------+------------------+---------------------+--------------------------------------+
| d69ca113-1813-49c7-964d-5411c4731823 | 10.0.0.4 | 10.87.118.28 | 8fa89318-a346-4061-b617-ee8b96b0c387 |
| 6a1062b2-ba68-4dd3-8938-6f73240e7172 | 10.0.0.6 | 10.87.118.16 | b9c9a318-5a26-4d23-81c2-99cf2364572a |
| 335d881c-9328-4cb0-ae4f-2c12804f614b | 10.0.0.7 | 10.87.118.25 | cbbc105b-d912-4093-bfa8-511fe83d2bbc |
| 30fe5c9a-0c00-4fdd-8d7b-9b506e576724 | 10.0.0.9 | 10.87.118.27 | 0dfc09be-4f7a-4068-b511-518438d48f8b |
| f4d14687-80da-4c89-96b7-63975b755d5f | 10.0.0.5 | 10.87.118.26 | 9af0df19-cda9-4168-9e09-86116275d3ce |
| 864b3ef1-73b2-4694-a71c-b12ec5ebf220 | 10.0.0.8 | 10.87.118.17 | 25d06a2a-b0d9-4627-800d-1801cd1fa93e |
+--------------------------------------+------------------+---------------------+--------------------------------------+
root@5b9s1-vm1:~#

root@server1:~# rabbitmqctl cluster_status
Cluster status of node rabbit@server1ctrl ...
[{nodes,[{disc,[rabbit@server1ctrl,rabbit@server3ctrl]}]},
{running_nodes,[rabbit@server3ctrl,rabbit@server1ctrl]},
{cluster_name,<<"<email address hidden>">>},
{partitions,[]}]
root@server1:~#

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/35314
Submitter: Dheeraj Gautam (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/35314
Committed: http://github.com/Juniper/contrail-puppet/commit/c0bc476d32619fdea94c36ac3ae3538240d3dfc5
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit c0bc476d32619fdea94c36ac3ae3538240d3dfc5
Author: Dheeraj Gautam <email address hidden>
Date: Thu Aug 31 12:45:14 2017 -0700

non master nodes should wait for master rabbit to come up

Change-Id: I72f6073ae8f0026f1f87c219f95e4c14d67aa938
Closes-Bug: #1695662
(cherry picked from commit 31b0045795a66c64ae47105208ee0844a56add46)
(cherry picked from commit 803ffcf50a50c8cc59af4b30665f818820d90d10)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.